GEATbx: | Main page Tutorial Algorithms M-functions Parameter/Options Example functions www.geatbx.com |
In the preceding chapters individual methods and operators were described. In this chapter I shall explain how complete optimization algorithms can be created from these operators. Each of these optimization algorithms represents an evolutionary algorithm.
As there is a large number of problem classes, and as a different optimization procedure is best suited to each problem class, I would like to make some recommendations here which often hold good. These recommendations are based both on my experience with the optimization of various problems, as well as on other authors' results taken from the relevant literature.
By using this approach the user is provided with insights into the specific properties of the operators and the dependencies between the operators. The user can better understand how the interplay of these properties and dependencies defines an optimization algorithm with certain properties. Based on this, is it easier to adapt the proposed algorithms to new problem areas. Moreover, it can be better estimated why an algorithm has difficulties or fails when faced with a particular problem.
Besides the choice of an optimization algorithm (combination of operators), the behavior of the optimization algorithms can be controlled via a number of options (also termed parameters). Since a lot of the options of an optimization algorithm cannot be determined exactly, in most cases a range for their values or a dependency of these values on other factors are specified. This renders the optimization algorithm more adaptable to different problem sizes and makes it easier to recognize which mutual dependencies have to be considered.
In the case of some procedures and operators, the options can be stated relatively independently of the problem which has to be solved. These procedures and the corresponding options will be explained and described in section 10.1.
For application to specific problem classes, specific operators and their options will be presented in section 10.2-10.5.
The specific problem classes can be divided into the following groups and variants:
When describing optimization algorithms for the specific problem classes, I shall illustrate only the particular operators together with their options. In every other case the procedures described in section 10.1, together with their options, will be applied.
The values suggested in this chapter for the individual parameters/options can, of course, be changed. The user should, however, then know what effect these changes have and whether they necessitate changes to other parameters. This is usually only possible after a relatively long period of practice with and examination of the way the individual operators work as well as the function and combined effects of the parameters.
For a more detailed explanation of the significance of the individual operators and their parameters refer to chapter 2 and chapter 8.
10.1 Generally Adjustable Operators and Options |
This section describes procedures and operators for which the options can be adjusted relatively independently of the problem which has to be solved. With these adjustments a robust working is provided for many problems which have to be solved.
As explained in section 3.1, a rank based procedure (ranking) should always be employed for fitness assignment . All of the procedures presented in sections 3.3-3.7 can then be applied for selection . The selection procedure used is less decisive than the value of the corresponding option - the selection pressure. A comparison of the procedures and an examination of their differences was made in section 3.8.
Using selection pressure values in the area of 1.5 to 2 worked well in many cases. A higher selection pressure should only be applied in the case of very large populations (over 500 individuals).
The generation gap option has a greater influence on the course which the optimization takes. This option states how many offspring will be produced in comparison to the number of individuals in the population. A generation gap of less than 1 ensures that less offspring are produced than there are individuals in the population. As a result, some of the parents survive into the next generation, in this case the best individuals so far. This conforms to an elitist strategy . With a generation gap of exactly 1, exactly as many offspring as parents are produced and the offspring replace all the parents. A generation gap of a little less than 1 (0.8-0.99) ensures that very good solutions which have been found are only replaced by even better solutions.
Where, in the following, several procedures or variants of operators are described for a problem class, these can be applied together during an optimization run. This takes place via the application of different strategies as described in section 9.1. This application of different strategies allows the simultaneous deployment of various strategies. In this way, one can investigate which of these strategies is successful for the problem to be solved. It is then sufficient, during subsequent optimization, to limit oneself to the deployment of these successful strategies.
The simultaneous application of several mutation operators or letting these work with different parameters is an example of this. As a result, various search strategies are applied simultaneously. This often leads to better results. For example, three subpopulations can be used, whose parameters only differ in the mutation range (large, medium, small) - see section 5.1. In this way a rough, a medium and a fine search are carried out simultaneously. The rough search is usually successful at the beginning of a run and the fine search at the end.
Due to the additional deployment of competition between the subpopulations, section 9.2, a simultaneous and efficient distribution of the computational resources between the strategies, in favor of successful strategies, is achieved. This is then particularly noticeable if different strategies are successful at different times during an optimization run. The strategies which are most successful at the respective times are allocated more resources and can, as a consequence, step up their search for better solutions. As soon as another strategy, and thus another subpopulation, becomes more successful, it is allocated more resources (in this case more individuals) and is able to carry out its search more effectively.
Both of these concepts, application of different strategies and the extension to competition between strategies, can be applied universally. They prove particularly useful when searching for suitable procedures and operators for the solution of new systems whose behavior is not yet so well known. By applying these concepts, particularly in the initial phase of work on a system, better results are achieved more quickly than when different strategies are tried out consecutively. Moreover, it is quite possible that one single strategy is not successful, whereas strategies being run together are successful within a relatively short space of time.
Both methods (application of different strategies and the extension to competition between strategies) are based on the subdivision of the population into subpopulations, the regional population model (see section 8.3). The development in the subpopulations takes place separately for a certain isolation period. From time to time an exchange of (good) individuals is made between the subpopulations. In this way, information found in one subpopulation also reaches the other subpopulations.
When applying the regional model, one usually works with a migration time of 20 (up to 40) generations, independently of how long a run actually needs in order to achieve results. If more than 1000 generations are being worked with, the migration time can be raised to every 50 generations, for short runs of around 50 generations a migration time of 5-10 generations is better. The migration rate (proportion of the population to be exchanged) is usually set at 10%, the migration structure at unlimited (exchange between all subpopulations - complete net). More detailed information on the parameters is provided in section 8.3.
The explanations in this section were able to show that many options of an evolutionary algorithm can be adjusted to very sensible and robust values, without having to be readapted to every problem.
At this point I would like to provide a short summary of the procedures explained so far and of a robust parameterization:
This covers the generally applicable explanations and comments. The information in the following sections is related to specific problem categories.
10.2 Globally Oriented Parameter Optimization |
A large part of the technical application problems concerns parameter optimization problems with real and integer variables, for which there is usually no (strong) correlation between the variables present. In this section I shall present those evolutionary algorithms which are especially suited to the globally oriented optimization of these problems. Here a globally oriented search is defined as one in which no assumptions are made as to the type of the target function and the search is mainly carried out along the co-ordinate axes. Each variable is changed separately.
Almost all recombination operators can be used (including those for binary variables).Discrete recombination (recdis) is the most favourable and (less often) line recombination (reclin).
For integer variables line recombination and intermediate recombination (recint) can only be used to a limited extent.
The recombination rate should, as a rule, be set at 1.
The mutation operators available for real (mutreal) and integer variables (mutint), section 5.1, can exhibit very different behavior as a result of appropriate parameterization. For this reason, there is only one operator per variable type, whereby both work (almost) identically. An adaptation to the application case (rough mutation up to fine mutation) is achieved by the continuous adjustment of the corresponding options (and thus also the corresponding search strategy).
For a rough mutation, a mutation range of 0.2-0.05 is used, for a fine search down to values of 10^{-8} and lower. The mutation range defines the value of the largest mutation step in relation to the definition range of the respective variable/parameter.
A mutation precision of 16 (range of 8-20) achieves a sensible distribution of the mutation steps (mostly small mutation steps, large steps only occasionally).
A possible parameterization of the mutation for real variables using 4 different strategies could involve mutation ranges of [0.1, 0.03, 0.001, 0.0003], all with a mutation precision of 10. Thus, a rough screening of the search area is connected with a fine search. This is an approach which has proved itself in many practical applications.
The mutation rate should be set at (1/number of variables) and should only be increased in exceptions or in the case of selection pressure larger than 2.
The parameterization of the mutation operator usually has the greatest influence on the effectiveness and thus the course of the optimization. For this reason, the search for an effective algorithm for the solution of the current problem should be concentrated on this operator. By using different mutation ranges in the individual subpopulations (and thus the application of different strategies), an adjustment to the problem can be carried out quickly.
The high-level toolbox functions for globally oriented parameter optimization are tbx3real (real-valued representation) and tbx3int (integer-valued representation).
10.3 Locally Oriented Parameter Optimization |
In this section I shall present those evolutionary algorithms which are particularly suited to the locally oriented optimization of real variables. This comprises, generally speaking, application problems for which there is a strong correlation between the (real) variables. Due to this, the variables have to be changed simultaneously (co-ordinated) and it must be possible to search in all directions within the search area.
To this end, those search directions which are most likely to be successful have to be "learned". Special mutation operators, originating from the evolution strategies area, section 5.3, are available for such problems. These attempt to learn the direction of an improvement and thus render a goal-oriented search possible.
However, the application of these operators is limited to smooth functions and low-dimensional problems (the larger the problem is, the longer the learning of a direction takes). If a target function is noisy, it is highly likely that these evolutionary algorithms will get stuck in the next local minima. In exactly the same way, the algorithms have almost no chance of getting out of a local minima again. This makes them unsuitable for multi-modal problems. For more detailed information on the application of these strategies refer to [Ost97] and [Han98].
These procedures (usually) do not involve recombination.
Special mutation operators are available (mutes1, mutes2) which originate from the evolution strategies area, section 5.3. Only the size of the initial step is specified for these operators. Subsequently these functions carry out an independent adaptation of the step sizes or of the search direction.
The mutation range is no more than a specification for the range of the initial step size. Values of 10^{-3} to 10^{-7} (related to the definition range of the individual variables, as this is implemented in the GEATbx) have worked well. The determination of a sensible initial step size is problem-specific and cannot be done easily.
It is also possible to work with different strategies when using these operators. These would differ in the mutation range (e.g. large, medium, small). Each of the strategies (subpopulations) therefore starts with a different initial step size range. However, as soon as the adaptation of the step sizes has taken place, the different mutation ranges no longer have any influence. This, consequently, only conduces to the determination of sensible or successful initial step sizes.
It should be noted that these procedures for locally oriented optimization only work with a small number of individuals and that a lot of offspring are produced, of which only the best are added to the population.
In order to achieve this we have a choice of two possibilities. On the one hand, the population can contain few individuals which each produce a lot of offspring. Only the best of these offspring replace the parents and form the new population. For this, the following parameters for population size and selection should be used:
On the other hand, one works with a larger population size, in which only the very best individuals produce offspring. Here, (almost) all offspring replace the parents and form the new population:
The first variant is based on the original "definition" of these procedures and has its complete justification. In the case of the second variant, however, there are some small advantages, which have particularly come to the fore during practical application. Firstly, all individuals produced (offspring) can be integrated in the (later) evaluation as they are included in the population for at least one generation (in the case of the first variant only a small part of the offspring produced are included in the population). Furthermore, in the case of the second variant, an elitist selection can be defined very easily, by which the hitherto best individual survives in the population. For this one simply has to define a generation gap of somewhat less than 1 (e.g. 0.99).
The high-level toolbox function for locally oriented parameter optimization is tbx3es1.
10.4 Parameter Optimization of Binary Variables |
The parameter optimisation of binary variables is applied, above all, for classic genetic algorithms. Here, the real or integer variables are transformed into a larger number of binary variables via discretization. The genetic algorithm then works on these binary variables. Before the objective function is calculated, the binary variables have to be decoded into their original format.
As these genetic algorithms are still in use, a number of corresponding operators are available. Moreover, there are some real-world problems which use binary variables directly.
All recombination operators for binary variables can be used. Discrete recombination (recdis), which is identical to uniform crossover, and the operators with reduced surrogate (recsprs, recdprs, recshrs,) are favourable. The recombination rate is usually set at 0.7 in the relevant literature.
An operator (mutbin) is available for the mutation of binary variables. Only the mutation rate can be regulated for this operator, there can be no other possibilities for influencing binary variables. The mutation rate is usually set at (1/number of variables) and is only increased in exceptional cases.
The high-level toolbox function for parameter optimization of binary variables is tbx3bin.
10.5 Combinatorial Optimization |
The evolutionary algorithms presented so far are meant to be used for parameter optimization. Beside that, there is a further problem category of considerable size; that of combinatorial optimization. Problem categories that are known within combinational optimisation are travelling sales person TSP, quadratic assignment problem QAP, production scheduling (e.g. job shop scheduling JSS) and vehicle routing problem VRP. These problems are often called ordering or permutation problems.
At this point, however, two restrictions must be placed. There exists an extremely large number of application fields for combinational optimization, all of which call for a slightly different procedure. Especially with regard to solving substantial problems special procedures and operators must be applied in order to achieve a result with justifiable effort. This cannot be covered comprehensively within this section.
Some information and hints shall be provided here that have proved themselves within the scope of this work. These are able to show how the `new' application of combinational optimization falls into line with the structure of evolutionary algorithms used so far and is correspondingly filled with the specific operators. The statements from section 10.1, concerning the application and parameterization of the generally adjustable procedures and operators can be directly applied here.
Within the scope of the GEATbx two recombination operators for combinatorial problems have been used: partial matching crossover PMX (recpm) and generalized position recombination GPX (recgp). These have been designed especially for the handling of sequence / ordering / permutation problems. All of the operators ensure the validity of the individuals generated by recombination.
The fundamental difference between the operators lies in the information they retain during recombination. Three properties are important for combinatorial optimization: the neighbourhood relation between the variables (relative relation), the absolute relation of the variables, and the absolute position of the variable. As regards the TSP the relative relation is crucial, the information on the absolute position is not. In the case of the QAP the absolute position of the variables is decisive but not the neighbourhood relation to other variables. As for the JSS again the absolute relation is important. The problem to be solved has to be examined with regard to which information should be retained or to which of the problem categories it belongs.
GPX particularly retains the absolute relation of the variables, but also shows good behaviour in retaining the relative relation. PMX is best suited to retaining the relative relation of the variables when working with locally optimized individuals.
By applying various strategies one can also quite simply determine which of these operators (or further recombination operators for combinatorial problems) is best suited for the solution of a problem. The subpopulations use one of the recombination operators each. The best subpopulation then provides an indication of which of the operators used should be applied for further optimization runs. Quite often this might be more than one operator.
The recombination rate is mostly set to values of 1 or a little below 1.
There are several operators available for mutation (swap mutation (mutswap), move mutation (mutmove), invert mutation (mutinvert), scramble mutation (mutrandperm)).These mutation operators can be adjusted to mutation steps of varied sizes by specifying the mutation range and the mutation precision. This leads to a functionality in terms of large and small changes. The mutation range determines the maximum distance between the variables to be changed within the individual. The mutation precision serves to distribute the mutation steps in this area for a distribution between large and small mutation steps.
The mutation rate serves to control the number of changes per individual. In most cases appropriate behavior should be obtained with a mutation rate of (1/number of variables), which corresponds to e.g. an exchange/swap of two variables or the move of one variable.
The high-level toolbox function for combinatorial optimization is tbx3perm.
10.6 Parameter Optimization of Variables of different Representations |
I would like to end this chapter by looking at a problem which often occurs in practice and which the previous sections on parameter optimization in this chapter have in common. Up to now it has been assumed that all the variables of a problem to be solved exist in the same representation. This is, however, not always the case. When solving some problems in practice variables of differing types have to be dealt with simultaneously. This means that some of an individual's variables are real and others are integer or binary.
As previously mentioned, each variable representation has different operators which are best suited to it. In addition, certain singularities have to be taken into account for each data type. A way has to be found to optimize all the variables simultaneously despite these different requirements.
There are two procedures for the processing of problems with variables of different representations:
If all the variable types are converted into one representation, the different combinations of types present must be considered and it must be decided into which representation they will be converted. In principle, it is possible to convert all the variables into each representation. Some of the advantages and disadvantages, as well as what should be taken into consideration, will be explained in the following.
First I would like to look at the combination of integer and binary variables as it is possible to do this very simply and without restrictions. An integer variable within the limits [0, 1] is equivalent to a binary variable. The evolutionary operators for integer variables work well with these binary variables. The settings for integer variables are used, see section 10.2.
If real and integer/binary variables are present, the integer representation can be used. In this case there are no restrictions for binary and integer variables. These can retain their original representation.
For real variables, on the other hand, a conversion or adaptation must take place. If the definition range of the real variables is very broad and the limitation to integer values does not cause any problems, this is the best method to choose. If, however, a higher accuracy of the real variables' values is necessary, then the real definition range must be discretized. These discrete values can then be mapped onto integer values which are used during optimization.
This conversion or adaptation can be achieved by means of a relatively simple transformation of the corresponding variables in the objective function of the problem. In doing so, one must determine with which accuracy each of the real variables is to be optimized. This accuracy can be determined by using a scaling factor for each variable. A scaling factor of 10 means that this variable is processed with a minimum resolution of 0.1 (^{1}/_{10}). Correspondingly, a scaling factor of 200 signifies a minimum resolution of 0.005 (^{1}/_{200}).
Subsequently, this is used as a basis to increase the definition range of the variables (for optimization) by this accuracy or scaling factor. The evolutionary algorithm and predetermined operators work with this definition range and integer variables. Before each evaluation of the objective function, the integer variables are converted with the determined scaling factor and are then available as real variables in the desired resolution.
I would like to illustrate the procedure described using two examples. In the first, a real variable is used with a definition range of [-1, +1]. An accuracy of 0.01 is desired for this variable. For this, the original limits of the definition range of this variable are multiplied by a scale factor of ^{1}/_{0,01}=100. During optimization this variable is treated as an integer within the limits [-100, +100]. Before the target function is evaluated this variable is divided by the scaling factor of 100. The real variable which is to be used is available again with the desired accuracy.
In the second example we have 3 variables, the first and third as real variables, the second as an integer variable. The first variable should have a minimum resolution of 0.2, the third one of 0.005. The scale factor for all 3 variables must therefore be determined at [5, 1, 200]. By multiplying by this scaling factor the definition range for these variables is increased (or maintained for the integer variable). Before the evaluation of the objective function, division by the scaling factors results in a conversion of the integer variable values into real variable values of the desired resolution within the problem specific definition range.
The application of this approach is very flexible. On the one hand, the desired resolution of the variables can be influenced exactly. On the other, all the deployment possibilities for the evolutionary operators for integer variables continue to be available.
The only disadvantage is that, in the case of the formerly real variables, the direct relationship between the variables' values during optimization and the problem specific value is lost. The variable values from optimization first have to be converted with the scaling factor.
This disadvantage can, however, be put up with and does not outweigh the good manageability of this method. When solving systems with variables of different representations, the use of integer variables for optimization has proved itself and is recommended here for application.
For the combination of operators and their parameterization to evolutionary algorithms, the details given on parameter optimization of the corresponding representation in the previous sections, especially section 10.2, hold good. I shall, therefore, not go into it further at this point.
Besides a conversion of real variables into an integer representation, binary representation can also be used as the smallest common denominator. For this, all integer and real variables are transmuted into binary representation with which the evolutionary algorithm then works. This approach corresponds to classical genetic algorithms.
The desired resolution of each variable has to be predetermined for this conversion too. In the case of integer variables this occurs straightforward, an integer variable within the limits [10, 500] can be coded via log_{2}(491) < 9 binary variables. In the case of real variables the data given above apply. One must, however, take into account that base 2 is being used. For a real variable, which is decoded with 10 binary variables, this means a resolution of around 0.001 (1/2^{10}) with respect to the definition range (in the case of linear scaling).
As pointed out in section 10.4, many users still work with this method. It has both historical importance and justification. It has been shown in many applications, however, that the use of evolutionary operators available for integer and real variables result in more effective evolutionary algorithms. Nevertheless, the decoding of binary variables into integer (bin2int) and real variables (bin2real) is supported in the GEATbx.
In principle, it is also possible to work with a real representation and to carry out a limitation of the variable values in the objective function (rounding off the variables to the next integer value).
However, this can have the consequence that a mutation of originally binary or integer variables has no effect (due to the subsequent rounding off). This can be particularly extreme in the case of binary variables because only a change of, on average, a quarter of the definition range results in a change in the binary variable value. In the case of integer variables this problem becomes less serious, the broader the definition range of the corresponding variable is.
For this reason, real representation should only be used in special cases or when combining integer values with a broad value range (difference between lower and upper limit is greater than 1000) and real variables.
10.7 Summary |
In this chapter I have presented details on the combination and parameterization of powerful evolutionary algorithms. For this, a subdivision into frequently occurring problem classes was undertaken (parameter optimization of real, integer and binary variables as well as sequence or ordering optimization). Details were provided at the beginning of the explanations which can be applied similarly to most of the problem categories (fitness assignment, selection, application of the regional population model with different strategies, competition). Thereafter, a compilation of the operators with robust parameters, which are specific to each problem class, were presented.
There are several possible solutions to problems with variables of different representations. According to each type of variable the advantages and disadvantages of the conversion and application of the different representations were addressed. During application the conversion of real variables into integer representation and its use for the optimization has proved successful and is thus recommended.
The evolutionary algorithms presented in this chapter provide the basis for their application to systems of various problem classes. After the analysis of the system to be solved the user may select the appropriate evolutionary algorithm. An adaptation is only necessary if there are problem specific requirements.
GEATbx: | Main page Tutorial Algorithms M-functions Parameter/Options Example functions www.geatbx.com |