Evolutionary Algorithms 3 Selection

In selection the offspring producing individuals are chosen. The first step is fitness assignment. Each individual in the selection pool receives a reproduction probability depending on the own objective value and the objective value of all other individuals in the selection pool. This fitness is used for the actual selection step afterwards.

Throughout this section some terms are used for comparing the different selection schemes. The definition of these terms follow [Bak87] and [BT95].

In rank-based fitness assignment, the population is sorted according to the objective values. The fitness assigned to each individual depends only on its position in the individuals rank and not on the actual objective value.

Rank-based fitness assignment overcomes the scaling problems of the proportional fitness assignment. (Stagnation in the case where the selective pressure is too small or premature convergence where selection has caused the search to narrow down too quickly.) The reproductive range is limited, so that no individuals generate an excessive number of offspring. Ranking introduces a uniform scaling across the population and provides a simple and effective way of controlling selective pressure.

Rank-based fitness assignment behaves in a more robust manner than proportional fitness assignment and, thus, is the method of choice. [BH91], [Why89]

3.1.1 Linear ranking

Consider Nind the number of individuals in the population, Pos the position of an individual in this population (least fit individual has Pos=1, the fittest individual Pos=Nind) and SP the selective pressure. The fitness value for an individual is calculated as:

3.1.2 Non-linear ranking

A new method for ranking using a non-linear distribution was introduced in [Poh95]. The use of non-linear ranking permits higher selective pressures than the linear ranking method.

3.1.3 Comparison of linear and non-linear ranking

The probability of each individual being selected for mating depends on its fitness normalized by the total fitness of the population.

Table contains the fitness values of the individuals for various values of the selective pressure assuming a population of 11 individuals and a minimization problem.


	fitness value (parameter: selective pressure)
	linear ranking		no ranking	non-linear ranking
objective value	2.0	1.1	1.0	3.0	2.0
1	2.0	1.10	1,0	3.00	2.00
3	1.8	1.08	1,0	2.21	1.69
4	1.6	1.06	1,0	1.62	1.43
7	1.4	1.04	1,0	1.99	1.21
8	1.2	1.02	1,0	0.88	1.03
9	1.0	1.00	1,0	0.65	0.87
10	0.8	0.98	1,0	0.48	0.74
15	0.6	0.96	1,0	0.35	0.62
20	0.4	0.94	1,0	0.26	0.53
30	0.2	0.92	1,0	0.19	0.45
95	0.0	0.90	1,0	0.14	0.38

3.1.4 Analysis of linear ranking

Where proportional and rank-based fitness assignment is concerned it is assumed that individuals display only one objective function value. In many real world problems, however, there are several criteria which have to be considered in order to evaluate the quality of an individual. Only on the basis of the comparison of these several criteria (thus multi-objective) can a decision be made as to the superiority of one individual over another. Then, as in single-objective problems, an order of individuals within the population can be established from these reciprocal comparisons - multi-objective ranking. After this order has been established the single-objective ranking methods from the subsection 3.1 can be used to convert the order of the individuals to corresponding fitness values.

Multi-objective fitness assignment (and with it multi-objective optimization ) is concerned with the simultaneous minimization of NObj criteria f_r, with r = 1, ..., NObj. The values f_r are determined by the objective function, which in turn is dependent on the variables of the individuals (the decision variables).

A straightforward example should serve as the motivation for the following considerations. When objects are produced, the production costs should be kept low and the objects should be produced quickly. Various solutions can be developed during production planning which may differ regarding the number and type of the machines employed, as well as regarding the number of workers. The criteria production costs f₁and production time f₂, both of which are determined by the objective function, serve as evaluation criteria for each solution.

3.2.1 PARETO-ranking

The superiority of one solution over the other can be decided by comparing 2 solutions. It can be carried out both generally and for as many criteria as desired, following the schema in equation 3-7.

If solution₁ is p< (partially less than) solution₂, it follows that solution₁ dominates solution₂. In the example used here this means: if costs and time are less for solution₁ than for solution₂, it follows that solution₁ is superior to solution₂. It would even be sufficient if one of the two values was equal for both solutions (equal costs) and only the other value was lower (less time).

If, however, none of the solutions dominates the other both solutions are to be regarded as equivalent with respect to the PARETO-order. The same rank is assigned to individuals which do not dominate each other.

The rank of an individual within the population (rank_i) depends on the number of individuals (NumInd_dominated) dominating this individual [Fon95]:

All solutions that are found during optimization and are not dominated by a different solution constitute the PARETO-optimal solutions (PARETO-optimal set) of this problem (PARETO-optimality). These solutions are assigned a rank value of 1. In the case of each PARETO-optimal solution it is not possible to improve one of the criteria without one or several of the other criteria deteriorating.

3.2.2 Goal attainment or method of inequalities

When using plain PARETO-ranking in equation 3-7 all PARETO-optimal solutions are equivalent. It is possible, however, to make a further differentiation for many practical problems. As above, the example of object production shall be used to illustrate this. In an extreme case a solution could produce nothing at all. In this case, the costs would equal zero and the production time would be infinite. No other solution could produce the objects at lower costs. Thus, as this solution cannot be dominated, it would belong to the PARETO-optimal solutions if PARETO-dominance was applied exclusively. In a second extreme case a solution could produce objects in a very short time. This would result in very high expenditure, and therefore very high costs. It should be obvious that both cases are not desirable although they belong to the non-dominated solutions which are not dominated.

Goals for the individual criteria can be introduced in order to preclude PARETO-optimal set solutions, which lie outside of the results desired by the user. A solution is only acceptable when the goals for the individual criteria have been reached. This procedure is also termed method of inequalities , MOI, or goal programming. The individual goals are predefined as inequalities.

In the production example used here, this could mean that an upper limit for costs and production time is determined. The result is not acceptable until both goals are simultaneously reached or fallen short of by means of a solution.

When the goals are included in the multi-objective ranking the comparison of two solutions becomes somewhat more complex. The following assumptions are made:

In the definitions the operator partially less p< from equation 3-7 is used for each of the following definitions. It is possible to distinguish 3 different cases.

PARETO-ranking using a vector of goals for the individual criteria, allows for an improved generation of the sequence of a number of solutions compared to plain PARETO-ranking.

Multi-objective optimization is carried out in order to find a number of non-dominated solutions, the PARETO-optimal set. A normal evolutionary algorithm, however, converges at a single solution. This process is termed a genetic drift. For this reason, methods must be built in, which achieve the maintenance or expansion of population diversity (prevention of premature convergence).

3.2.3 Sharing

On the one hand, the genetic drift can be counteracted by the application of fitness sharing). On the other hand, the algorithm has the effect that a greater part of the pareto-optimal solution is ascertained. The basic principle is that individuals in a particular niche must share the resources available amongst themselves. In this way, the more individuals are near an individual, the lower its fitness becomes.

During application the size of the niches and the allocation of resources within each niche must be established. Methods for fitness sharing are suggested in [HN93], [SD94] and [Fon95],amongst others.

3.2.4 Further information on multi-objective optimization

In this subsection it was only possible to provide a short introduction to the multi-objective fitness assignment within the context of evolutionary algorithms. The relevant literature should be referred to for more specific questions regarding individual procedures: [Hor97], [Fon95], [ZT98], [HN93], [SD94], [Vel99]. A list of a large amount of literature on multi-objective optimization with evolutionary algorithms can be accessed in [Coe99]. There is a great deal of literature on multi-objective optimization not specific to the evolutionary algorithm. [Mie99] is recommended as a good starting point.

3.2.5 Weighted sum - aggregation or scalarization of multiple objectives

When considering different methods and component parts used for multi-objective optimization one should not forget classic methods for the integration of several criteria (scalarization method, also called aggregation of objectives).

The weighted sum is the most well-known method. Here each criterion is assigned a weighting value. By means of the linear combination of all the weighted criteria a combined objective function value F_ws is achieved:

The weighted sum is especially used in cases when the varying significance of the individual criteria is known or can be estimated. As this is frequently the case where practical application is concerned, the weighted sum is often applied.

If a multi-objective problem is solved by means of single-objective optimization, only a point solution is obtained. The advantage of obtaining several solutions of equal value relating to a target vector is lost. At this stage the user must decide whether the simple use of the weighted sum or the approximation of the PARETO-optimal solutions is more important for the solution of the problem.

The simplest selection scheme is roulette-wheel selection, also called stochastic sampling with replacement [Bak87]. This is a stochastic algorithm and involves the following technique:

The individuals are mapped to contiguous segments of a line, such that each individual's segment is equal in size to its fitness. A random number is generated and the individual whose segment spans the random number is selected. The process is repeated until the desired number of individuals is obtained (called mating population). This technique is analogous to a roulette wheel with each slice proportional in size to the fitness, see figure .

Table shows the selection probability for 11 individuals, linear ranking and selective pressure of 2 together with the fitness value. Individual 1 is the most fit individual and occupies the largest interval, whereas individual 10 as the second least fit individual has the smallest interval on the line (see figure ). Individual 11, the least fit interval, has a fitness value of 0 and get no chance for reproduction

Number of individual	1	2	3	4	5	6	7	8	9	10	11
fitness value	2.0	1.8	1.6	1.4	1.2	1.0	0.8	0.6	0.4	0.2	0.0
selection probability	0.18	0.16	0.15	0.13	0.11	0.09	0.07	0.06	0.03	0.02	0.0

For selecting the mating population the appropriate number of uniformly distributed random numbers (uniform distributed between 0.0 and 1.0) is independently generated.

Figure shows the selection process of the individuals for the example in table together with the above sample trials.

The roulette-wheel selection algorithm provides a zero bias but does not guarantee minimum spread.

Stochastic universal sampling [Bak87] provides zero bias and minimum spread. The individuals are mapped to contiguous segments of a line, such that each individual's segment is equal in size to its fitness exactly as in roulette-wheel selection. Here equally spaced pointers are placed over the line as many as there are individuals to be selected. Consider NPointer the number of individuals to be selected, then the distance between the pointers are 1/NPointer and the position of the first pointer is given by a randomly generated number in the range [0, 1/NPointer].

For 6 individuals to be selected, the distance between the pointers is 1/6=0.167. Figure shows the selection for the above example.

Stochastic universal sampling ensures a selection of offspring which is closer to what is deserved then roulette wheel selection.

In local selection every individual resides inside a constrained environment called the local neighborhood. (In the other selection methods the whole population or subpopulation is the selection pool or neighborhood.) Individuals interact only with individuals inside this region. The neighborhood is defined by the structure in which the population is distributed. The neighborhood can be seen as the group of potential mating partners.

The first step is the selection of the first half of the mating population uniform at random (or using one of the other mentioned selection algorithms, for example, stochastic universal sampling or truncation selection). Now a local neighborhood is defined for every selected individual. Inside this neighborhood the mating partner is selected (best, fitness proportional, or uniform at random).

The distance between possible neighbors together with the structure determines the size of the neighborhood. Table gives examples for the size of the neighborhood for the given structures and different distance values.

Fig. 3-6: Two-dimensional neighborhood; left: full and half cross, right: full and half star

Between individuals of a population an `isolation by distance' exists. The smaller the neighborhood, the bigger the isolation distance. However, because of overlapping neighborhoods, propagation of new variants takes place. This assures the exchange of information between all individuals.

The size of the neighborhood determines the speed of propagation of information between the individuals of a population, thus deciding between rapid propagation or maintenance of a high diversity/variability in the population. A higher variability is often desired, thus preventing problems such as premature convergence to a local minimum. Similar results were drawn from simulations in [VBS91]. Local selection in a small neighborhood performed better than local selection in a bigger neighborhood. Nevertheless, the interconnection of the whole population must still be provided. Two-dimensional neighborhood with structure half star using a distance of 1 is recommended for local selection. However, if the population is bigger (>100 individuals) a greater distance and/or another two-dimensional neighborhood should be used.


	distance
structure of selection	1	2
full ring	2	4
half ring	1	2
full cross	4	8 (12)
half cross	2	4 (5)
full star	8	24
half star	3	8

Compared to the previous selection methods modeling natural selection truncation selection is an artificial selection method. It is used by breeders for large populations/mass selection.

In truncation selection individuals are sorted according to their fitness. Only the best individuals are selected for parents. These selected parents produce uniform at random offspring. The parameter for truncation selection is the truncation threshold Trunc. Trunc indicates the proportion of the population to be selected as parents and takes values ranging from 50%-10%. Individuals below the truncation threshold do not produce offspring. The term selection intensity is often used in truncation selection. Table shows the relation between both.

truncation threshold	1%	10%	20%	40%	50%	80%
selection intensity	2.66	1.76	1.2	0.97	0.8	0.34

3.6.1 Analysis of truncation selection

In [BT95] an analysis of truncation selection can be found. The same results have been derived in a different way in [CK70] as well.

In tournament selection [GD91] a number Tour of individuals is chosen randomly from the population and the best individual from this group is selected as parent. This process is repeated as often as individuals must be chosen. These selected parents produce uniform at random offspring. The parameter for tournament selection is the tournament size Tour. Tour takes values ranging from 2 to Nind (number of individuals in population). Table and figure show the relation between tournament size and selection intensity [BT95].

tournament size	1	2	3	5	10	30
selection intensity	0	0.56	0.85	1.15	1.53	2.04

3.7.1 Analysis of tournament selection

As shown in the previous sections of this chapter the selection methods behave similarly assuming similar selection intensity.

3.8.1 Selection parameter and selection intensity

Figure shows the relation between selection intensity and the appropriate parameters of the selection methods (selective pressure, truncation threshold and tournament size). It should be stated that with tournament selection only discrete values can be assigned and linear ranking selection allows only a smaller range for the selection intensity.

However, the behavior of the selection methods is different. Thus, the selection methods will be compared on the parameters loss of diversity (figure ) and selections variance (figure ) on the selection intensity.

3.8.2 Loss of diversity and selection intensity

Truncation selection leads to a much higher loss of diversity for the same selection intensity compared to ranking and tournament selection. Truncation selection is more likely to replace less fit individuals with fitter offspring, because all individuals below a certain fitness threshold do not have a probability to be selected. Ranking and tournament selection seem to behave similarly. However, ranking selection works in an area where tournament selection does not work because of the discrete character of tournament selection.

3.8.3 Selection variance and selection intensity

For the same selection intensity truncation selection leads to a much smaller selection variance than ranking or tournament selection. As can be seen clearly ranking selection behaves similar to tournament selection. However, again ranking selection works in an area where tournament selection does not work because of the discrete character of tournament selection. In [BT95] was proven that the fitness distribution for ranking and tournament selection for SP=2 and Tour=2 (SelInt=1/sqrt(pi)) is identical.

This document is part of version 3.7 of the GEATbx: Genetic and Evolutionary Algorithm Toolbox for use with Matlab - www.geatbx.com.
The Genetic and Evolutionary Algorithm Toolbox is not public domain.
© 1994-2005 Hartmut Pohlheim, All Rights Reserved, (support@geatbx.com).

Evolutionary Algorithms 3 Selection

3 Selection

3.1 Rank-based fitness assignment

3.1.1 Linear ranking

3.1.2 Non-linear ranking

3.1.3 Comparison of linear and non-linear ranking

3.1.4 Analysis of linear ranking

3.2 Multi-objective Ranking

3.2.1 PARETO-ranking

3.2.2 Goal attainment or method of inequalities

3.2.3 Sharing

3.2.4 Further information on multi-objective optimization

3.2.5 Weighted sum - aggregation or scalarization of multiple objectives

3.3 Roulette wheel selection

3.4 Stochastic universal sampling

3.5 Local selection

3.6 Truncation selection

3.6.1 Analysis of truncation selection

3.7 Tournament selection

3.7.1 Analysis of tournament selection

3.8 Comparison of selection schemes

3.8.1 Selection parameter and selection intensity

3.8.2 Loss of diversity and selection intensity

3.8.3 Selection variance and selection intensity