Context Navigation

source: branches/OKBJavaConnector/ECJClient/src/ec/parsimony/README @ 12912

Visit:

Last change on this file since 12912 was 6152, checked in by bfarka, 14 years ago
added ecj and custom statistics to communicate with the okb services #1441
File size: 8.6 KB

Line
1	ECJ supports several bloat control techniques. Many of these techniques
2	are compared in detail in "A Comparison of bloat Control Methods for
3	Genetic Programming" by Sean Luke and Liviu Panait. In this directory
4	we have implementations of several of them.
5
6	There are several methods in the article which aren't here. The two you
7	should be aware of are BIASED MULTIOBJECTIVE, where we do multiobjective
8	optimization with fitness as one objective and size as another; and
9	plain-old LINEAR parsimony pressure, where the "fitness" F of an
10	individual is actually his real fitness R and his size S, combined in a
11	linear function, that is, F = A*R + S for some value of R. We mention
12	these two because, like many of the techniques below, they perform well
13	over many different problem domains. And importantly, LINEAR performed
14	the best in our tests! Closely followed by RATIO TOURNAMENT SELECTION
15	(see below). Double Tournament was pretty good too. The problem with
16	linear parsimony pressure is that it could need to be tuned carefully --
17	though a setting of A = 32 seemed to work well in many problems. Thus
18	you may wish to try out doing a simple linear parsimony pressure before
19	going to a more exotic method. You can implement linear parsimony
20	pressure in your evaluation function: compute the fitness, then call the
21	size() method on the individual, then set the fitness of the individual
22	to the linear combination of the two.
23
24	Another technique commonly used to control bloat in GP is depth
25	limiting. ECJ implements depth limiting Koza-style with a maximum depth
26	set to 17. You can often get better results with a smaller depth limit.
27	Interestingly, when the depth limit is set to 17, you can use depth
28	limiting in COMBINATION with ANY of the parsimony pressure techniques
29	discussed here, (including linear and biased multiobjective) and the
30	result is typically better than using them separately.
31
32
33
34	DOUBLE TOURNAMENT
35
36	Double tournament is a two-layer hierarchy of tournament selection
37	operators. Some N tournament selections are performed on some criterion;
38	and then the winners of those tournaments become contestants in a final
39	tournament. The winner of the final tournament becomes the selected
40	individual. You can have fitness as the first criterion and size as the
41	second criterion, or the other way around.
42
43	Here are good settings we've found for typical GP experiments.
44
45	[BASE] = ec.parsimony.DoubleTournamentSelection
46	# Do length as the initial tournaments, and fitness as the final tournament
47	[BASE].do-length-first = true
48	# The initial tournaments are of size 1.4
49	[BASE].size = 1.4
50	# The final tournament is of size 7
51	[BASE].size2 = 7
52
53	The default base is
54	select.double-tournament
55
56
57	PROPORTIONAL TOURNAMENT
58
59	Proportional tournament is a single tournament selection; but some
60	percentage of the time the tournament is according to size rather than
61	according to fitness.
62
63	Here are good settings we've found for typical GP experiments.
64
65	[BASE] = ec.parsimony.ProportionalTournamentSelection
66	# The size of the tournament
67	[BASE].size = 7
68	# The probability that the tournament is by fitness (1.0 is equivalent
69	# to "regular" tournament selection).
70	[BASE].fitness-prob = 0.8
71
72	The default base is
73	select.proportional-tournament
74
75
76
77	LEXICOGRAPHIC TOURNAMENT
78
79	Lexicographic tournament selection is simple. We do a tournament
80	selection by fitness, breaking ties by choosing the smaller individual.
81	Thus size is a secondary consideration: for example, if all your fitness
82	values are likely to be different, then size will never have an effect.
83	Thus plain lexicographic tournament selection works best when there are
84	a limited number of possible fitness values.
85
86	Lexicographic tournament has no special settings -- it's basically the
87	same as plain tournament selection. Here's how we'd set it up for GP
88	problems:
89
90	[BASE] = ec.parsimony.LexicographicTournament
91	# The size of the tournament
92	[BASE].size = 7
93
94	The default base is
95	select.lexicographic-tournament
96
97
98
99	[DIRECT] BUCKETED TOURNAMENT SELECTION
100
101	Bucketed tournament selection is an improvement of sorts over plain
102	Lexicographic tournament selection. The idea is to create an artificial
103	equivalency of fitness values, even when none exists in reality, for
104	purposes of lexicographic selection. This allows size to become a
105	significant factor. to create a set of N buckets. The population, of
106	size S, is then sorted and divided into these buckets. It's not divided
107	quite equally. Instead, the bottom S/N individuals are placed in the
108	first bucket. Then any individuals left in the population whose fitness
109	equals the fittest individual in that bucket are also put in that
110	bucket. Then the bottom S/N of the remaining individuals in the
111	population are put in the second bucket, plus any individuals whose
112	fitness equals the fittest individual in the second bucket. And so on.
113	This continues until we've run out of individuals to put into buckets.
114	The idea is to make sure that individuals with the same fitness are all
115	placed into the same bucket. The "fitness" of an individual, for
116	purposes of lexicographic selection, is now his bucket number.
117
118	We did not find a direct bucketing number-of-buckets parameter which was
119	good across several problem domains. We found 100 was good for
120	artificial ant, 250 for 11-bit Multiplexer and 5-bit Parity, and 25 for
121	Symbolic Regression. You'll need to experiment a bit. Here's the
122	settings for Multiplexer:
123
124	[BASE] = ec.parsimony.BucketTournamentSelection
125	# The size of the tournament
126	[BASE].size = 7
127	# The number of buckets
128	[BASE].num-buckets = 250
129
130	The default base is
131	select.bucket-tournament
132
133
134
135	RATIO BUCKETED TOURNAMENT SELECTION
136
137	Ratio Bucketing improves a bit over direct bucketing. Here, the idea is
138	to push low-fitness individuals in to large buckets and place
139	high-fitness individuals into smaller buckets, even as small as the
140	individual itself. This allows more fitness-based distinction among the
141	"important" individuals in the search (the fitter ones) and puts more
142	parsimony pressure in the "less important" individuals. We do this by
143	defining a ratio of remaining individuals in the population to put in
144	the next bucket. Let's say this ratio is 1/R. We put the 1/R worst
145	individuals of the population in lowest bucket, plus all remaining
146	individuals in the population whose fitness is equal to the fittest
147	individual in the bucket. We then put the 1/4 next worst remaining
148	individuals in the next bucket, plus all remaining individuals in the
149	population whose fitness is equal to the fittest individual in the
150	second bucket. And so on, until all individuals have been placed into
151	buckets. The "fitness" of an individual, for purposes of lexicographic
152	selection, is now his bucket number.
153
154	Like direct bucketing, we did not find a value of R which was good
155	across several problem domains. 2 was good for artificial ant, 11-bit
156	mutiplexer, and regression. but 6 ws good for 5-bit parity. So you'll
157	need to experiment. Here's the settings for Multiplexer:
158
159	[BASE] = ec.parsimony.RatioBucketTournamentSelection
160	# The size of the tournament
161	[BASE].size = 7
162	# The number of buckets
163	[BASE].ratio = 2
164
165	The default base is
166	select.ratio-bucket-tournament
167
168
169
170	TARPEIAN SELECTION
171
172	Tarpeian is fairly simple but clever. PRIOR to evaluation, we sort the
173	population by size, then identify the M individuals which have
174	above-average size. From those M individuals we "kill" some N percent.
175	Notice that M may vary from population to population depending on the
176	variance of size among the individuals.
177
178	By "kill" we mean that we set the fitness of those individuals to a very
179	bad value, and also mark them as evaluated so the Evaluator doesn't
180	bother evaluating them.
181
182	Not evaluating the individuals is really important: if every individual
183	is evaluated, Tarpeian is actually pretty costly compared to other
184	methods. But if we prematurely "kill" the individuals, then Tarpeian is
185	pretty competitive if you count total number of evaluations.
186
187	Because Tarpeian must do its work prior to evaluation, it can't operate
188	as a selection operator in ECJ's framework. Instead, we've arranged for
189	Tarpeian to be a Statistics subclass which plugs into the
190	preEvaluationStatistics hook. To use it, you just hang it off of your
191	statistics chain. Assuming you only have one existing Statistics
192	object, here's how you'd add Tarpeian in a manner which has proven good
193	across several problem domains:
194
195	stat.num-children = 1
196	stat.child.0 = ec.parsimony.TarpeianStatistics
197	stat.child.0.kill-proportion = 0.3
198
199	Note that our implementation of Tarpeian will operate over all of your
200	subpopuations, even if you don't want that. You may need to hack it to
201	operate differently if you have more than one subpopulation and don't
202	want Tarpeian parsimony on one or more of them.
203
204
205
206

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Update cookies preferences