Context Navigation

source: branches/OKBJavaConnector/ECJClient/src/ec/parsimony/README @ 9674

Visit:

Last change on this file since 9674 was 6152, checked in by bfarka, 14 years ago
added ecj and custom statistics to communicate with the okb services #1441
File size: 8.6 KB

Rev	Line
[6152]	1	ECJ supports several bloat control techniques. Many of these techniques
	2	are compared in detail in "A Comparison of bloat Control Methods for
	3	Genetic Programming" by Sean Luke and Liviu Panait. In this directory
	4	we have implementations of several of them.
	5
	6	There are several methods in the article which aren't here. The two you
	7	should be aware of are BIASED MULTIOBJECTIVE, where we do multiobjective
	8	optimization with fitness as one objective and size as another; and
	9	plain-old LINEAR parsimony pressure, where the "fitness" F of an
	10	individual is actually his real fitness R and his size S, combined in a
	11	linear function, that is, F = A*R + S for some value of R. We mention
	12	these two because, like many of the techniques below, they perform well
	13	over many different problem domains. And importantly, LINEAR performed
	14	the best in our tests! Closely followed by RATIO TOURNAMENT SELECTION
	15	(see below). Double Tournament was pretty good too. The problem with
	16	linear parsimony pressure is that it could need to be tuned carefully --
	17	though a setting of A = 32 seemed to work well in many problems. Thus
	18	you may wish to try out doing a simple linear parsimony pressure before
	19	going to a more exotic method. You can implement linear parsimony
	20	pressure in your evaluation function: compute the fitness, then call the
	21	size() method on the individual, then set the fitness of the individual
	22	to the linear combination of the two.
	23
	24	Another technique commonly used to control bloat in GP is depth
	25	limiting. ECJ implements depth limiting Koza-style with a maximum depth
	26	set to 17. You can often get better results with a smaller depth limit.
	27	Interestingly, when the depth limit is set to 17, you can use depth
	28	limiting in COMBINATION with ANY of the parsimony pressure techniques
	29	discussed here, (including linear and biased multiobjective) and the
	30	result is typically better than using them separately.
	31
	32
	33
	34	DOUBLE TOURNAMENT
	35
	36	Double tournament is a two-layer hierarchy of tournament selection
	37	operators. Some N tournament selections are performed on some criterion;
	38	and then the winners of those tournaments become contestants in a final
	39	tournament. The winner of the final tournament becomes the selected
	40	individual. You can have fitness as the first criterion and size as the
	41	second criterion, or the other way around.
	42
	43	Here are good settings we've found for typical GP experiments.
	44
	45	[BASE] = ec.parsimony.DoubleTournamentSelection
	46	# Do length as the initial tournaments, and fitness as the final tournament
	47	[BASE].do-length-first = true
	48	# The initial tournaments are of size 1.4
	49	[BASE].size = 1.4
	50	# The final tournament is of size 7
	51	[BASE].size2 = 7
	52
	53	The default base is
	54	select.double-tournament
	55
	56
	57	PROPORTIONAL TOURNAMENT
	58
	59	Proportional tournament is a single tournament selection; but some
	60	percentage of the time the tournament is according to size rather than
	61	according to fitness.
	62
	63	Here are good settings we've found for typical GP experiments.
	64
	65	[BASE] = ec.parsimony.ProportionalTournamentSelection
	66	# The size of the tournament
	67	[BASE].size = 7
	68	# The probability that the tournament is by fitness (1.0 is equivalent
	69	# to "regular" tournament selection).
	70	[BASE].fitness-prob = 0.8
	71
	72	The default base is
	73	select.proportional-tournament
	74
	75
	76
	77	LEXICOGRAPHIC TOURNAMENT
	78
	79	Lexicographic tournament selection is simple. We do a tournament
	80	selection by fitness, breaking ties by choosing the smaller individual.
	81	Thus size is a secondary consideration: for example, if all your fitness
	82	values are likely to be different, then size will never have an effect.
	83	Thus plain lexicographic tournament selection works best when there are
	84	a limited number of possible fitness values.
	85
	86	Lexicographic tournament has no special settings -- it's basically the
	87	same as plain tournament selection. Here's how we'd set it up for GP
	88	problems:
	89
	90	[BASE] = ec.parsimony.LexicographicTournament
	91	# The size of the tournament
	92	[BASE].size = 7
	93
	94	The default base is
	95	select.lexicographic-tournament
	96
	97
	98
	99	[DIRECT] BUCKETED TOURNAMENT SELECTION
	100
	101	Bucketed tournament selection is an improvement of sorts over plain
	102	Lexicographic tournament selection. The idea is to create an artificial
	103	equivalency of fitness values, even when none exists in reality, for
	104	purposes of lexicographic selection. This allows size to become a
	105	significant factor. to create a set of N buckets. The population, of
	106	size S, is then sorted and divided into these buckets. It's not divided
	107	quite equally. Instead, the bottom S/N individuals are placed in the
	108	first bucket. Then any individuals left in the population whose fitness
	109	equals the fittest individual in that bucket are also put in that
	110	bucket. Then the bottom S/N of the remaining individuals in the
	111	population are put in the second bucket, plus any individuals whose
	112	fitness equals the fittest individual in the second bucket. And so on.
	113	This continues until we've run out of individuals to put into buckets.
	114	The idea is to make sure that individuals with the same fitness are all
	115	placed into the same bucket. The "fitness" of an individual, for
	116	purposes of lexicographic selection, is now his bucket number.
	117
	118	We did not find a direct bucketing number-of-buckets parameter which was
	119	good across several problem domains. We found 100 was good for
	120	artificial ant, 250 for 11-bit Multiplexer and 5-bit Parity, and 25 for
	121	Symbolic Regression. You'll need to experiment a bit. Here's the
	122	settings for Multiplexer:
	123
	124	[BASE] = ec.parsimony.BucketTournamentSelection
	125	# The size of the tournament
	126	[BASE].size = 7
	127	# The number of buckets
	128	[BASE].num-buckets = 250
	129
	130	The default base is
	131	select.bucket-tournament
	132
	133
	134
	135	RATIO BUCKETED TOURNAMENT SELECTION
	136
	137	Ratio Bucketing improves a bit over direct bucketing. Here, the idea is
	138	to push low-fitness individuals in to large buckets and place
	139	high-fitness individuals into smaller buckets, even as small as the
	140	individual itself. This allows more fitness-based distinction among the
	141	"important" individuals in the search (the fitter ones) and puts more
	142	parsimony pressure in the "less important" individuals. We do this by
	143	defining a ratio of remaining individuals in the population to put in
	144	the next bucket. Let's say this ratio is 1/R. We put the 1/R worst
	145	individuals of the population in lowest bucket, plus all remaining
	146	individuals in the population whose fitness is equal to the fittest
	147	individual in the bucket. We then put the 1/4 next worst remaining
	148	individuals in the next bucket, plus all remaining individuals in the
	149	population whose fitness is equal to the fittest individual in the
	150	second bucket. And so on, until all individuals have been placed into
	151	buckets. The "fitness" of an individual, for purposes of lexicographic
	152	selection, is now his bucket number.
	153
	154	Like direct bucketing, we did not find a value of R which was good
	155	across several problem domains. 2 was good for artificial ant, 11-bit
	156	mutiplexer, and regression. but 6 ws good for 5-bit parity. So you'll
	157	need to experiment. Here's the settings for Multiplexer:
	158
	159	[BASE] = ec.parsimony.RatioBucketTournamentSelection
	160	# The size of the tournament
	161	[BASE].size = 7
	162	# The number of buckets
	163	[BASE].ratio = 2
	164
	165	The default base is
	166	select.ratio-bucket-tournament
	167
	168
	169
	170	TARPEIAN SELECTION
	171
	172	Tarpeian is fairly simple but clever. PRIOR to evaluation, we sort the
	173	population by size, then identify the M individuals which have
	174	above-average size. From those M individuals we "kill" some N percent.
	175	Notice that M may vary from population to population depending on the
	176	variance of size among the individuals.
	177
	178	By "kill" we mean that we set the fitness of those individuals to a very
	179	bad value, and also mark them as evaluated so the Evaluator doesn't
	180	bother evaluating them.
	181
	182	Not evaluating the individuals is really important: if every individual
	183	is evaluated, Tarpeian is actually pretty costly compared to other
	184	methods. But if we prematurely "kill" the individuals, then Tarpeian is
	185	pretty competitive if you count total number of evaluations.
	186
	187	Because Tarpeian must do its work prior to evaluation, it can't operate
	188	as a selection operator in ECJ's framework. Instead, we've arranged for
	189	Tarpeian to be a Statistics subclass which plugs into the
	190	preEvaluationStatistics hook. To use it, you just hang it off of your
	191	statistics chain. Assuming you only have one existing Statistics
	192	object, here's how you'd add Tarpeian in a manner which has proven good
	193	across several problem domains:
	194
	195	stat.num-children = 1
	196	stat.child.0 = ec.parsimony.TarpeianStatistics
	197	stat.child.0.kill-proportion = 0.3
	198
	199	Note that our implementation of Tarpeian will operate over all of your
	200	subpopuations, even if you don't want that. You may need to hack it to
	201	operate differently if you have more than one subpopulation and don't
	202	want Tarpeian parsimony on one or more of them.
	203
	204
	205
	206

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Update cookies preferences