Free cookie consent management tool by TermsFeed Policy Generator

source: branches/OKBJavaConnector/ECJClient/src/ec/parsimony/README @ 13229

Last change on this file since 13229 was 6152, checked in by bfarka, 14 years ago

added ecj and custom statistics to communicate with the okb services #1441

File size: 8.6 KB
Line 
1ECJ supports several bloat control techniques.  Many of these techniques
2are compared in detail in "A Comparison of bloat Control Methods for
3Genetic Programming" by Sean Luke and Liviu Panait.  In this directory
4we have implementations of several of them.
5
6There are several methods in the article which aren't here.  The two you
7should be aware of are BIASED MULTIOBJECTIVE, where we do multiobjective
8optimization with fitness as one objective and size as another; and
9plain-old LINEAR parsimony pressure, where the "fitness" F of an
10individual is actually his real fitness R and his size S, combined in a
11linear function, that is, F = A*R + S for some value of R.  We mention
12these two because, like many of the techniques below, they perform well
13over many different problem domains.  And importantly, LINEAR performed
14the best in our tests!  Closely followed by RATIO TOURNAMENT SELECTION
15(see below).  Double Tournament was pretty good too.  The problem with
16linear parsimony pressure is that it could need to be tuned carefully --
17though a setting of A = 32 seemed to work well in many problems.  Thus
18you may wish to try out doing a simple linear parsimony pressure before
19going to a more exotic method.  You can implement linear parsimony
20pressure in your evaluation function: compute the fitness, then call the
21size() method on the individual, then set the fitness of the individual
22to the linear combination of the two.
23
24Another technique commonly used to control bloat in GP is depth
25limiting.  ECJ implements depth limiting Koza-style with a maximum depth
26set to 17.  You can often get better results with a smaller depth limit.
27Interestingly, when the depth limit is set to 17, you can use depth
28limiting in COMBINATION with ANY of the parsimony pressure techniques
29discussed here, (including linear and biased multiobjective) and the
30result is typically better than using them separately.
31
32
33
34DOUBLE TOURNAMENT
35
36Double tournament is a two-layer hierarchy of tournament selection
37operators. Some N tournament selections are performed on some criterion;
38and then the winners of those tournaments become contestants in a final
39tournament.  The winner of the final tournament becomes the selected
40individual.  You can have fitness as the first criterion and size as the
41second criterion, or the other way around.
42
43Here are good settings we've found for typical GP experiments.
44
45[BASE] = ec.parsimony.DoubleTournamentSelection
46# Do length as the initial tournaments, and fitness as the final tournament
47[BASE].do-length-first = true
48# The initial tournaments are of size 1.4
49[BASE].size = 1.4
50# The final tournament is of size 7
51[BASE].size2 = 7
52
53The default base is
54  select.double-tournament
55
56
57PROPORTIONAL TOURNAMENT
58
59Proportional tournament is a single tournament selection; but some
60percentage of the time the tournament is according to size rather than
61according to fitness.
62
63Here are good settings we've found for typical GP experiments.
64
65[BASE] = ec.parsimony.ProportionalTournamentSelection
66# The size of the tournament
67[BASE].size = 7
68# The probability that the tournament is by fitness (1.0 is equivalent
69# to "regular" tournament selection).
70[BASE].fitness-prob = 0.8
71
72The default base is
73  select.proportional-tournament
74
75
76
77LEXICOGRAPHIC TOURNAMENT
78
79Lexicographic tournament selection is simple.  We do a tournament
80selection by fitness, breaking ties by choosing the smaller individual.
81Thus size is a secondary consideration: for example, if all your fitness
82values are likely to be different, then size will never have an effect.
83Thus plain lexicographic tournament selection works best when there are
84a limited number of possible fitness values.
85
86Lexicographic tournament has no special settings -- it's basically the
87same as plain tournament selection.  Here's how we'd set it up for GP
88problems:
89
90[BASE] = ec.parsimony.LexicographicTournament
91# The size of the tournament
92[BASE].size = 7
93
94The default base is
95  select.lexicographic-tournament
96
97
98
99[DIRECT] BUCKETED TOURNAMENT SELECTION
100
101Bucketed tournament selection is an improvement of sorts over plain
102Lexicographic tournament selection.  The idea is to create an artificial
103equivalency of fitness values, even when none exists in reality, for
104purposes of lexicographic selection.  This allows size to become a
105significant factor.  to create a set of N buckets.  The population, of
106size S, is then sorted and divided into these buckets.  It's not divided
107quite equally.  Instead, the bottom S/N individuals are placed in the
108first bucket.  Then any individuals left in the population whose fitness
109equals the fittest individual in that bucket are *also* put in that
110bucket.  Then the bottom S/N of the remaining individuals in the
111population are put in the second bucket, plus any individuals whose
112fitness equals the fittest individual in the second bucket. And so on.
113This continues until we've run out of individuals to put into buckets.
114The idea is to make sure that individuals with the same fitness are all
115placed into the same bucket.  The "fitness" of an individual, for
116purposes of lexicographic selection, is now his bucket number.
117
118We did not find a direct bucketing number-of-buckets parameter which was
119good across several problem domains.  We found 100 was good for
120artificial ant, 250 for 11-bit Multiplexer and 5-bit Parity, and 25 for
121Symbolic Regression.  You'll need to experiment a bit.  Here's the
122settings for Multiplexer:
123
124[BASE] = ec.parsimony.BucketTournamentSelection
125# The size of the tournament
126[BASE].size = 7
127# The number of buckets
128[BASE].num-buckets = 250
129
130The default base is
131  select.bucket-tournament
132
133
134
135RATIO BUCKETED TOURNAMENT SELECTION
136
137Ratio Bucketing improves a bit over direct bucketing.  Here, the idea is
138to push low-fitness individuals in to large buckets and place
139high-fitness individuals into smaller buckets, even as small as the
140individual itself.  This allows more fitness-based distinction among the
141"important" individuals in the search (the fitter ones) and puts more
142parsimony pressure in the "less important" individuals.  We do this by
143defining a ratio of remaining individuals in the population to put in
144the next bucket.  Let's say this ratio is 1/R.  We put the 1/R worst
145individuals of the population in lowest bucket, plus all remaining
146individuals in the population whose fitness is equal to the fittest
147individual in the bucket.  We then put the 1/4 next worst remaining
148individuals in the next bucket, plus all remaining individuals in the
149population whose fitness is equal to the fittest individual in the
150second bucket.  And so on, until all individuals have been placed into
151buckets.  The "fitness" of an individual, for purposes of lexicographic
152selection, is now his bucket number.
153
154Like direct bucketing, we did not find a value of R which was good
155across several problem domains.  2 was good for artificial ant, 11-bit
156mutiplexer, and regression.  but 6 ws good for 5-bit parity.  So you'll
157need to experiment.  Here's the settings for Multiplexer:
158
159[BASE] = ec.parsimony.RatioBucketTournamentSelection
160# The size of the tournament
161[BASE].size = 7
162# The number of buckets
163[BASE].ratio = 2
164
165The default base is
166        select.ratio-bucket-tournament
167
168
169
170TARPEIAN SELECTION
171
172Tarpeian is fairly simple but clever.  PRIOR to evaluation, we sort the
173population by size, then identify the M individuals which have
174above-average size. From those M individuals we "kill" some N percent.
175Notice that M may vary from population to population depending on the
176variance of size among the individuals.
177
178By "kill" we mean that we set the fitness of those individuals to a very
179bad value, and also mark them as evaluated so the Evaluator doesn't
180bother evaluating them.
181
182Not evaluating the individuals is really important: if every individual
183is evaluated, Tarpeian is actually pretty costly compared to other
184methods.  But if we prematurely "kill" the individuals, then Tarpeian is
185pretty competitive if you count total number of evaluations.
186
187Because Tarpeian must do its work prior to evaluation, it can't operate
188as a selection operator in ECJ's framework.  Instead, we've arranged for
189Tarpeian to be a Statistics subclass which plugs into the
190preEvaluationStatistics hook.  To use it, you just hang it off of your
191statistics chain.  Assuming you only have one existing Statistics
192object, here's how you'd add Tarpeian in a manner which has proven good
193across several problem domains:
194
195stat.num-children = 1
196stat.child.0 = ec.parsimony.TarpeianStatistics
197stat.child.0.kill-proportion = 0.3
198
199Note that our implementation of Tarpeian will operate over *all* of your
200subpopuations, even if you don't want that.  You may need to hack it to
201operate differently if you have more than one subpopulation and don't
202want Tarpeian parsimony on one or more of them.
203
204
205
206
Note: See TracBrowser for help on using the repository browser.