1 | This package, and subpackages, comprise the genetic programming facility of |
---|
2 | ECJ. This is by far ECJ's best-tested facility. |
---|
3 | |
---|
4 | The GP facility is fairly complex and detailed. You might do well to start |
---|
5 | with the ECJ tutorials, one of which is a GP tutorial. There are a number of |
---|
6 | GP example applications in the app directory as well (including regression, |
---|
7 | ant, multiplexer, lawnmower, etc.) |
---|
8 | |
---|
9 | The gp.params file contains certain parameters common to gp problems. But more |
---|
10 | useful to you will be the koza.params file, located in the 'koza' directory, |
---|
11 | which contains an extensive collection of default parameter settings common to |
---|
12 | the GP problems often found in John Koza's works. Most of the application |
---|
13 | examples rely on the koza.params files for their defaults. After doing the |
---|
14 | tutorials, it might be helpful to peruse the koza.params file. |
---|
15 | |
---|
16 | |
---|
17 | |
---|
18 | |
---|
19 | GP Individuals |
---|
20 | -------------- |
---|
21 | |
---|
22 | ECJ's GP system is tree-based. GP individuals use the GPIndividual class or |
---|
23 | a subclass, whose species must be GPSpecies or a subclass. A GPIndividual |
---|
24 | contains an array of GPTrees, each representing a tree in the individual. |
---|
25 | Commonly there's only one such tree, but there's no reason you can't have |
---|
26 | more (and we've used as many as eleven in the past). Just as individuals |
---|
27 | share common data in Species, GPTrees share common data in GPTreeConstraints |
---|
28 | objects. That data is primarily the function set used by the GPTree in |
---|
29 | question and the return type of the tree. Hanging from GPTrees are trees |
---|
30 | whose nodes are all subclasses of GPNode. GPNodes contain an array of GPNodes |
---|
31 | (the "children" to that GPNode). If a GPNode has no children, its array may |
---|
32 | be null rather than empty, to save a bit of memory. Various GPNodes share |
---|
33 | common data in GPNodeConstraints objects, which largely specify the arity |
---|
34 | of the GPNode in question (that is, the number of children it has), its |
---|
35 | return type, and, for each array slot, the type that children in that slot |
---|
36 | must have their return types type-compatible with. |
---|
37 | |
---|
38 | Each GPNode has a parent. That parent is usually another GPNode (to which it |
---|
39 | is the child). The root GPNode has the GPTree as its parent. Thus GPNode |
---|
40 | and GPTree both implement the GPNodeParent interface. |
---|
41 | |
---|
42 | |
---|
43 | GP Function Sets and Node Builders |
---|
44 | ---------------------------------- |
---|
45 | |
---|
46 | Each GPTreeConstraints contains a GPFunctionSet object, essentially a |
---|
47 | collection of prototypical GPNodes. GP trees are constructed from copies |
---|
48 | of these nodes. GPTreeConstraints also contain a GPNodeBuilder of some kind, |
---|
49 | responsible for actually constructing a brand new GP tree. Some common |
---|
50 | GPNodeBuilders are located in the koza directory (GrowBuilder, HalfBuilder, |
---|
51 | and FullBuilder, all subclasses of KozaBuilder). Other GPNodeBuilders are |
---|
52 | located in the build directory. |
---|
53 | |
---|
54 | |
---|
55 | Types |
---|
56 | ----- |
---|
57 | |
---|
58 | GPNodes can't just be strung together willy-nilly to form trees. Not only |
---|
59 | must a tree only contain nodes drawn from that tree's function set, but |
---|
60 | certain constraints are placed on which nodes may serve as children to other |
---|
61 | nodes, or as the root of the tree. These constraints are called types. ECJ's |
---|
62 | type facility defines an abstract superclass of types called GPType with two |
---|
63 | concrete subclasses, GPAtomicType and GPSetType. These types are stored in |
---|
64 | global collections defined within these classes. Here's how it works. Each |
---|
65 | GPNode has a return type (either an atomic or set type). GPNodes which can |
---|
66 | have children also have separate types for each of those child slots. In order |
---|
67 | for a node to be permitted as a child of another node in a given slot, the |
---|
68 | types must be compatible. Atomic types with one another are compatible only |
---|
69 | if they are exactly the same type. Set types contain sets of atomic types, and |
---|
70 | two set types are compatible with one another only if their intersection is |
---|
71 | nonempty. You can mix atomic and set typing as well: a set type is compatible |
---|
72 | with an atomic type if the atomic type is a member of the set type. GPTrees |
---|
73 | also specify a type for the root node: a GPNode's return type must be |
---|
74 | compatible with this type for the node to be legal as a root. The most common |
---|
75 | situation is for no typing to be used at all, or more properly described, |
---|
76 | for all nodes to use a single atomic type. The koza.params file in the |
---|
77 | koza directory defines seven GPNodeConstraints (representing nodes with |
---|
78 | zero through six children), all of which use the same atomic type everywhere. |
---|
79 | For no good reason the name of that type, in that file, is 'nil'. |
---|
80 | |
---|
81 | The various global cliques (GPNodeConstraints, GPTypes, and GPFunctionSets) |
---|
82 | are all set up initially, and stored for easy access, in GPInitializer, a |
---|
83 | subcass of Initializer which you must use (or a subclas thereof). |
---|
84 | |
---|
85 | |
---|
86 | Problems and Special Kinds of GPNodes |
---|
87 | ------------------------------------- |
---|
88 | |
---|
89 | Part of the process of making a GP problem involves the construction of the |
---|
90 | function set used for that problem. Usually this means directly subclassing |
---|
91 | GPNode several times. ECJ provides four utility GPNode subclasses which might |
---|
92 | also be useful to you. The ERC class permits the creation of Ephemeral Random |
---|
93 | Constants (ERCs), essentially GPNodes which describe constant values which |
---|
94 | are created randomly initially but are constant thereafter. To create them you |
---|
95 | need to subclass ERC and override certain methods. The ADF class defines |
---|
96 | Automatically Defined Function nodes which cause execution to transfer to |
---|
97 | another GPTree in the individual (an Automatically Defined Function). |
---|
98 | Associated with ADFs are ADFArgument nodes appearing in the corresponding |
---|
99 | Automatically Defined Function tree. These nodes provide the return values |
---|
100 | to children of the original ADF node. Last, ADM nodes (for Automatically |
---|
101 | Defined Macro) are like ADFs, only the return values of their ADFArguments |
---|
102 | are computed on-the-fly and dynamically rather than beforehand. |
---|
103 | |
---|
104 | All GP problems must be subclasses of GPProblem, which contains the machinery |
---|
105 | to make ADFs work (including special classes for execution transfer, |
---|
106 | ADFContext and ADFStack, which you'll never need to deal with). |
---|
107 | |
---|
108 | When a GP tree is executed, the root node is handed a GPData object which may |
---|
109 | contain information for the root node to act on, or (more commonly) slots of |
---|
110 | data for it to fill out before it hands the GPData object back. The root node |
---|
111 | goes on to reuse this GPData object by handing it to its children (and receiving |
---|
112 | it back) and so on down the line. As part of building a GPProblem, you'll |
---|
113 | probably need to define a GPData subclass. |
---|
114 | |
---|
115 | |
---|
116 | Breeding |
---|
117 | -------- |
---|
118 | |
---|
119 | GP has its own traditional breeding pipeline, which is defined entirely in |
---|
120 | the koza.params file as well. Breeding pipelines often, but are not required |
---|
121 | to, subclass from GPBreedingPipeline, which checks to make certain that the |
---|
122 | individual's species is a GPSpecies. Common pipelines are found in the |
---|
123 | koza directory: CrossoverPipeline and MutationPipeline. Other commonly used |
---|
124 | GP pipelines are in the ec/breed directory: ReproductionPipeline and |
---|
125 | MultiBreedingPipeline. |
---|
126 | |
---|
127 | To breed individuals, it's often necessary to select points within a tree |
---|
128 | (to do crossover on, for example). This is the job of the subclasses of the |
---|
129 | abstract superclass GPNodeSelector. A single concrete subclass in the 'koza' |
---|
130 | directory, KozaNodeSelector, can pick random nodes in a tree with certain |
---|
131 | probabilities based on how often you want certain kinds of nodes (the root, |
---|
132 | nonleaf nodes, leaf nodes, or all nodes). |
---|
133 | |
---|
134 | Certain utility functions in GPNode rely on an auxillary class, GPNodeGatherer, |
---|
135 | which is largely private and probably should be folded into GPNode. |
---|
136 | |
---|
137 | |
---|
138 | Fitness and Statistics |
---|
139 | ---------------------- |
---|
140 | |
---|
141 | Koza-style genetic programming has its own traditional fitness function which |
---|
142 | presents the fitness value in different guises. The raw fitness is set by the |
---|
143 | application problem. The standardized fitness converts this into a value where |
---|
144 | 0 is optimal and infinity is worse than the worst possible. In ECJ these two |
---|
145 | fitness values are the same thing. The adjusted fitness is the raw fitness |
---|
146 | converted so that 1 is optimal and 0 is worse than the worst possible. This is |
---|
147 | defined as adjustedFitness = (1 / (1 + rawFitness)). Last, the auxillary hits |
---|
148 | measure, mostly for historical reasons, indicates how many successful tests were |
---|
149 | made for certain kinds of problems. |
---|
150 | |
---|
151 | Because of this fitness complexity, ECJ has its own Fitness subclass different |
---|
152 | from SimpleFitness which is used for GP problems: KozaFitness (in the 'koza' |
---|
153 | directory). You can use whatever fitness you want for your problem, but |
---|
154 | KozaFitness may be convenient. |
---|
155 | |
---|
156 | Associated with KozaFitness are two revised Statistics subclasses which provide |
---|
157 | a variety of useful data special to GP. KozaStatistics logs the best trees |
---|
158 | of each run, plus certain fitness information. KozaShortStatistics logs a |
---|
159 | variety of easily parsed statistical information, one line per generation. |
---|
160 | Both are located in the 'koza' directory. |
---|
161 | |
---|
162 | |
---|