[6152] | 1 | This package, and subpackages, comprise the genetic programming facility of |
---|
| 2 | ECJ. This is by far ECJ's best-tested facility. |
---|
| 3 | |
---|
| 4 | The GP facility is fairly complex and detailed. You might do well to start |
---|
| 5 | with the ECJ tutorials, one of which is a GP tutorial. There are a number of |
---|
| 6 | GP example applications in the app directory as well (including regression, |
---|
| 7 | ant, multiplexer, lawnmower, etc.) |
---|
| 8 | |
---|
| 9 | The gp.params file contains certain parameters common to gp problems. But more |
---|
| 10 | useful to you will be the koza.params file, located in the 'koza' directory, |
---|
| 11 | which contains an extensive collection of default parameter settings common to |
---|
| 12 | the GP problems often found in John Koza's works. Most of the application |
---|
| 13 | examples rely on the koza.params files for their defaults. After doing the |
---|
| 14 | tutorials, it might be helpful to peruse the koza.params file. |
---|
| 15 | |
---|
| 16 | |
---|
| 17 | |
---|
| 18 | |
---|
| 19 | GP Individuals |
---|
| 20 | -------------- |
---|
| 21 | |
---|
| 22 | ECJ's GP system is tree-based. GP individuals use the GPIndividual class or |
---|
| 23 | a subclass, whose species must be GPSpecies or a subclass. A GPIndividual |
---|
| 24 | contains an array of GPTrees, each representing a tree in the individual. |
---|
| 25 | Commonly there's only one such tree, but there's no reason you can't have |
---|
| 26 | more (and we've used as many as eleven in the past). Just as individuals |
---|
| 27 | share common data in Species, GPTrees share common data in GPTreeConstraints |
---|
| 28 | objects. That data is primarily the function set used by the GPTree in |
---|
| 29 | question and the return type of the tree. Hanging from GPTrees are trees |
---|
| 30 | whose nodes are all subclasses of GPNode. GPNodes contain an array of GPNodes |
---|
| 31 | (the "children" to that GPNode). If a GPNode has no children, its array may |
---|
| 32 | be null rather than empty, to save a bit of memory. Various GPNodes share |
---|
| 33 | common data in GPNodeConstraints objects, which largely specify the arity |
---|
| 34 | of the GPNode in question (that is, the number of children it has), its |
---|
| 35 | return type, and, for each array slot, the type that children in that slot |
---|
| 36 | must have their return types type-compatible with. |
---|
| 37 | |
---|
| 38 | Each GPNode has a parent. That parent is usually another GPNode (to which it |
---|
| 39 | is the child). The root GPNode has the GPTree as its parent. Thus GPNode |
---|
| 40 | and GPTree both implement the GPNodeParent interface. |
---|
| 41 | |
---|
| 42 | |
---|
| 43 | GP Function Sets and Node Builders |
---|
| 44 | ---------------------------------- |
---|
| 45 | |
---|
| 46 | Each GPTreeConstraints contains a GPFunctionSet object, essentially a |
---|
| 47 | collection of prototypical GPNodes. GP trees are constructed from copies |
---|
| 48 | of these nodes. GPTreeConstraints also contain a GPNodeBuilder of some kind, |
---|
| 49 | responsible for actually constructing a brand new GP tree. Some common |
---|
| 50 | GPNodeBuilders are located in the koza directory (GrowBuilder, HalfBuilder, |
---|
| 51 | and FullBuilder, all subclasses of KozaBuilder). Other GPNodeBuilders are |
---|
| 52 | located in the build directory. |
---|
| 53 | |
---|
| 54 | |
---|
| 55 | Types |
---|
| 56 | ----- |
---|
| 57 | |
---|
| 58 | GPNodes can't just be strung together willy-nilly to form trees. Not only |
---|
| 59 | must a tree only contain nodes drawn from that tree's function set, but |
---|
| 60 | certain constraints are placed on which nodes may serve as children to other |
---|
| 61 | nodes, or as the root of the tree. These constraints are called types. ECJ's |
---|
| 62 | type facility defines an abstract superclass of types called GPType with two |
---|
| 63 | concrete subclasses, GPAtomicType and GPSetType. These types are stored in |
---|
| 64 | global collections defined within these classes. Here's how it works. Each |
---|
| 65 | GPNode has a return type (either an atomic or set type). GPNodes which can |
---|
| 66 | have children also have separate types for each of those child slots. In order |
---|
| 67 | for a node to be permitted as a child of another node in a given slot, the |
---|
| 68 | types must be compatible. Atomic types with one another are compatible only |
---|
| 69 | if they are exactly the same type. Set types contain sets of atomic types, and |
---|
| 70 | two set types are compatible with one another only if their intersection is |
---|
| 71 | nonempty. You can mix atomic and set typing as well: a set type is compatible |
---|
| 72 | with an atomic type if the atomic type is a member of the set type. GPTrees |
---|
| 73 | also specify a type for the root node: a GPNode's return type must be |
---|
| 74 | compatible with this type for the node to be legal as a root. The most common |
---|
| 75 | situation is for no typing to be used at all, or more properly described, |
---|
| 76 | for all nodes to use a single atomic type. The koza.params file in the |
---|
| 77 | koza directory defines seven GPNodeConstraints (representing nodes with |
---|
| 78 | zero through six children), all of which use the same atomic type everywhere. |
---|
| 79 | For no good reason the name of that type, in that file, is 'nil'. |
---|
| 80 | |
---|
| 81 | The various global cliques (GPNodeConstraints, GPTypes, and GPFunctionSets) |
---|
| 82 | are all set up initially, and stored for easy access, in GPInitializer, a |
---|
| 83 | subcass of Initializer which you must use (or a subclas thereof). |
---|
| 84 | |
---|
| 85 | |
---|
| 86 | Problems and Special Kinds of GPNodes |
---|
| 87 | ------------------------------------- |
---|
| 88 | |
---|
| 89 | Part of the process of making a GP problem involves the construction of the |
---|
| 90 | function set used for that problem. Usually this means directly subclassing |
---|
| 91 | GPNode several times. ECJ provides four utility GPNode subclasses which might |
---|
| 92 | also be useful to you. The ERC class permits the creation of Ephemeral Random |
---|
| 93 | Constants (ERCs), essentially GPNodes which describe constant values which |
---|
| 94 | are created randomly initially but are constant thereafter. To create them you |
---|
| 95 | need to subclass ERC and override certain methods. The ADF class defines |
---|
| 96 | Automatically Defined Function nodes which cause execution to transfer to |
---|
| 97 | another GPTree in the individual (an Automatically Defined Function). |
---|
| 98 | Associated with ADFs are ADFArgument nodes appearing in the corresponding |
---|
| 99 | Automatically Defined Function tree. These nodes provide the return values |
---|
| 100 | to children of the original ADF node. Last, ADM nodes (for Automatically |
---|
| 101 | Defined Macro) are like ADFs, only the return values of their ADFArguments |
---|
| 102 | are computed on-the-fly and dynamically rather than beforehand. |
---|
| 103 | |
---|
| 104 | All GP problems must be subclasses of GPProblem, which contains the machinery |
---|
| 105 | to make ADFs work (including special classes for execution transfer, |
---|
| 106 | ADFContext and ADFStack, which you'll never need to deal with). |
---|
| 107 | |
---|
| 108 | When a GP tree is executed, the root node is handed a GPData object which may |
---|
| 109 | contain information for the root node to act on, or (more commonly) slots of |
---|
| 110 | data for it to fill out before it hands the GPData object back. The root node |
---|
| 111 | goes on to reuse this GPData object by handing it to its children (and receiving |
---|
| 112 | it back) and so on down the line. As part of building a GPProblem, you'll |
---|
| 113 | probably need to define a GPData subclass. |
---|
| 114 | |
---|
| 115 | |
---|
| 116 | Breeding |
---|
| 117 | -------- |
---|
| 118 | |
---|
| 119 | GP has its own traditional breeding pipeline, which is defined entirely in |
---|
| 120 | the koza.params file as well. Breeding pipelines often, but are not required |
---|
| 121 | to, subclass from GPBreedingPipeline, which checks to make certain that the |
---|
| 122 | individual's species is a GPSpecies. Common pipelines are found in the |
---|
| 123 | koza directory: CrossoverPipeline and MutationPipeline. Other commonly used |
---|
| 124 | GP pipelines are in the ec/breed directory: ReproductionPipeline and |
---|
| 125 | MultiBreedingPipeline. |
---|
| 126 | |
---|
| 127 | To breed individuals, it's often necessary to select points within a tree |
---|
| 128 | (to do crossover on, for example). This is the job of the subclasses of the |
---|
| 129 | abstract superclass GPNodeSelector. A single concrete subclass in the 'koza' |
---|
| 130 | directory, KozaNodeSelector, can pick random nodes in a tree with certain |
---|
| 131 | probabilities based on how often you want certain kinds of nodes (the root, |
---|
| 132 | nonleaf nodes, leaf nodes, or all nodes). |
---|
| 133 | |
---|
| 134 | Certain utility functions in GPNode rely on an auxillary class, GPNodeGatherer, |
---|
| 135 | which is largely private and probably should be folded into GPNode. |
---|
| 136 | |
---|
| 137 | |
---|
| 138 | Fitness and Statistics |
---|
| 139 | ---------------------- |
---|
| 140 | |
---|
| 141 | Koza-style genetic programming has its own traditional fitness function which |
---|
| 142 | presents the fitness value in different guises. The raw fitness is set by the |
---|
| 143 | application problem. The standardized fitness converts this into a value where |
---|
| 144 | 0 is optimal and infinity is worse than the worst possible. In ECJ these two |
---|
| 145 | fitness values are the same thing. The adjusted fitness is the raw fitness |
---|
| 146 | converted so that 1 is optimal and 0 is worse than the worst possible. This is |
---|
| 147 | defined as adjustedFitness = (1 / (1 + rawFitness)). Last, the auxillary hits |
---|
| 148 | measure, mostly for historical reasons, indicates how many successful tests were |
---|
| 149 | made for certain kinds of problems. |
---|
| 150 | |
---|
| 151 | Because of this fitness complexity, ECJ has its own Fitness subclass different |
---|
| 152 | from SimpleFitness which is used for GP problems: KozaFitness (in the 'koza' |
---|
| 153 | directory). You can use whatever fitness you want for your problem, but |
---|
| 154 | KozaFitness may be convenient. |
---|
| 155 | |
---|
| 156 | Associated with KozaFitness are two revised Statistics subclasses which provide |
---|
| 157 | a variety of useful data special to GP. KozaStatistics logs the best trees |
---|
| 158 | of each run, plus certain fitness information. KozaShortStatistics logs a |
---|
| 159 | variety of easily parsed statistical information, one line per generation. |
---|
| 160 | Both are located in the 'koza' directory. |
---|
| 161 | |
---|
| 162 | |
---|