Free cookie consent management tool by TermsFeed Policy Generator

source: branches/OKBJavaConnector/ECJClient/src/ec/gp/README @ 10785

Last change on this file since 10785 was 6152, checked in by bfarka, 14 years ago

added ecj and custom statistics to communicate with the okb services #1441

File size: 8.6 KB
Line 
1This package, and subpackages, comprise the genetic programming facility of
2ECJ.  This is by far ECJ's best-tested facility.
3
4The GP facility is fairly complex and detailed.  You might do well to start
5with the ECJ tutorials, one of which is a GP tutorial.  There are a number of
6GP example applications in the app directory as well (including regression,
7ant, multiplexer, lawnmower, etc.)
8
9The gp.params file contains certain parameters common to gp problems.  But more
10useful to you will be the koza.params file, located in the 'koza' directory,
11which contains an extensive collection of default parameter settings common to
12the GP problems often found in John Koza's works.  Most of the application
13examples rely on the koza.params files for their defaults.  After doing the
14tutorials, it might be helpful to peruse the koza.params file.
15
16
17
18
19GP Individuals
20--------------
21
22ECJ's GP system is tree-based.  GP individuals use the GPIndividual class or
23a subclass, whose species must be GPSpecies or a subclass.  A GPIndividual
24contains an array of GPTrees, each representing a tree in the individual.
25Commonly there's only one such tree, but there's no reason you can't have
26more (and we've used as many as eleven in the past).  Just as individuals
27share common data in Species, GPTrees share common data in GPTreeConstraints
28objects.  That data is primarily the function set used by the GPTree in
29question and the return type of the tree.  Hanging from GPTrees are trees
30whose nodes are all subclasses of GPNode.  GPNodes contain an array of GPNodes
31(the "children" to that GPNode).  If a GPNode has no children, its array may
32be null rather than empty, to save a bit of memory.  Various GPNodes share
33common data in GPNodeConstraints objects, which largely specify the arity
34of the GPNode in question (that is, the number of children it has), its
35return type, and, for each array slot, the type that children in that slot
36must have their return types type-compatible with.
37
38Each GPNode has a parent.  That parent is usually another GPNode (to which it
39is the child).  The root GPNode has the GPTree as its parent.  Thus GPNode
40and GPTree both implement the GPNodeParent interface.
41
42
43GP Function Sets and Node Builders
44----------------------------------
45
46Each GPTreeConstraints contains a GPFunctionSet object, essentially a
47collection of prototypical GPNodes.  GP trees are constructed from copies
48of these nodes.  GPTreeConstraints also contain a GPNodeBuilder of some kind,
49responsible for actually constructing a brand new GP tree.  Some common
50GPNodeBuilders are located in the koza directory (GrowBuilder, HalfBuilder,
51and FullBuilder, all subclasses of KozaBuilder).  Other GPNodeBuilders are
52located in the build directory.
53
54
55Types
56-----
57
58GPNodes can't just be strung together willy-nilly to form trees.  Not only
59must a tree only contain nodes drawn from that tree's function set, but
60certain constraints are placed on which nodes may serve as children to other
61nodes, or as the root of the tree.  These constraints are called types.  ECJ's
62type facility defines an abstract superclass of types called GPType with two
63concrete subclasses, GPAtomicType and GPSetType.  These types are stored in
64global collections defined within these classes.  Here's how it works.  Each
65GPNode has a return type (either an atomic or set type).  GPNodes which can
66have children also have separate types for each of those child slots.  In order
67for a node to be permitted as a child of another node in a given slot, the
68types must be compatible.  Atomic types with one another are compatible only
69if they are exactly the same type.  Set types contain sets of atomic types, and
70two set types are compatible with one another only if their intersection is
71nonempty.  You can mix atomic and set typing as well: a set type is compatible
72with an atomic type if the atomic type is a member of the set type.   GPTrees
73also specify a type for the root node: a GPNode's return type must be
74compatible with this type for the node to be legal as a root.  The most common
75situation is for no typing to be used at all, or more properly described,
76for all nodes to use a single atomic type.  The koza.params file in the
77koza directory defines seven GPNodeConstraints (representing nodes with
78zero through six children), all of which use the same atomic type everywhere.
79For no good reason the name of that type, in that file, is 'nil'.
80
81The various global cliques (GPNodeConstraints, GPTypes, and GPFunctionSets)
82are all set up initially, and stored for easy access, in GPInitializer, a
83subcass of Initializer which you must use (or a subclas thereof).
84
85
86Problems and Special Kinds of GPNodes
87-------------------------------------
88
89Part of the process of making a GP problem involves the construction of the
90function set used for that problem.  Usually this means directly subclassing
91GPNode several times.  ECJ provides four utility GPNode subclasses which might
92also be useful to you.  The ERC class permits the creation of Ephemeral Random
93Constants (ERCs), essentially GPNodes which describe constant values which
94are created randomly initially but are constant thereafter.  To create them you
95need to subclass ERC and override certain methods.  The ADF class defines
96Automatically Defined Function nodes which cause execution to transfer to
97another GPTree in the individual (an Automatically Defined Function).
98Associated with ADFs are ADFArgument nodes appearing in the corresponding
99Automatically Defined Function tree.  These nodes provide the return values
100to children of the original ADF node.  Last, ADM nodes (for Automatically
101Defined Macro) are like ADFs, only the return values of their ADFArguments
102are computed on-the-fly and dynamically rather than beforehand.
103
104All GP problems must be subclasses of GPProblem, which contains the machinery
105to make ADFs work (including special classes for execution transfer,
106ADFContext and ADFStack, which you'll never need to deal with).
107
108When a GP tree is executed, the root node is handed a GPData object which may
109contain information for the root node to act on, or (more commonly) slots of
110data for it to fill out before it hands the GPData object back.  The root node
111goes on to reuse this GPData object by handing it to its children (and receiving
112it back) and so on down the line.  As part of building a GPProblem, you'll
113probably need to define a GPData subclass.
114
115
116Breeding
117--------
118
119GP has its own traditional breeding pipeline, which is defined entirely in
120the koza.params file as well.  Breeding pipelines often, but are not required
121to, subclass from GPBreedingPipeline, which checks to make certain that the
122individual's species is a GPSpecies.  Common pipelines are found in the
123koza directory: CrossoverPipeline and MutationPipeline.  Other commonly used
124GP pipelines are in the ec/breed directory: ReproductionPipeline and
125MultiBreedingPipeline.
126
127To breed individuals, it's often necessary to select points within a tree
128(to do crossover on, for example).  This is the job of the subclasses of the
129abstract superclass GPNodeSelector.  A single concrete subclass in the 'koza'
130directory, KozaNodeSelector, can pick random nodes in a tree with certain
131probabilities based on how often you want certain kinds of nodes (the root,
132nonleaf nodes, leaf nodes, or all nodes).
133
134Certain utility functions in GPNode rely on an auxillary class, GPNodeGatherer,
135which is largely private and probably should be folded into GPNode.
136
137
138Fitness and Statistics
139----------------------
140
141Koza-style genetic programming has its own traditional fitness function which
142presents the fitness value in different guises.  The raw fitness is set by the
143application problem.  The standardized fitness converts this into a value where
1440 is optimal and infinity is worse than the worst possible.  In ECJ these two
145fitness values are the same thing.  The adjusted fitness is the raw fitness
146converted so that 1 is optimal and 0 is worse than the worst possible.  This is
147defined as adjustedFitness = (1 / (1 + rawFitness)).  Last, the auxillary hits
148measure, mostly for historical reasons, indicates how many successful tests were
149made for certain kinds of problems.
150
151Because of this fitness complexity, ECJ has its own Fitness subclass different
152from SimpleFitness which is used for GP problems: KozaFitness (in the 'koza'
153directory).  You can use whatever fitness you want for your problem, but
154KozaFitness may be convenient.
155
156Associated with KozaFitness are two revised Statistics subclasses which provide
157a variety of useful data special to GP.  KozaStatistics logs the best trees
158of each run, plus certain fitness information.  KozaShortStatistics logs a
159variety of easily parsed statistical information, one line per generation.
160Both are located in the 'koza' directory.
161
162
Note: See TracBrowser for help on using the repository browser.