Opened 5 years ago
Last modified 3 years ago
#3039 closed feature request
BTC (Balanced Tree Creator) for HeuristicLab — at Initial Version
Reported by: | bburlacu | Owned by: | bburlacu |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.17 |
Component: | Encodings.SymbolicExpressionTreeEncoding | Version: | trunk |
Keywords: | genetic programming, tree initialization | Cc: |
Description
I would like to propose a new tree creation algorithm with two main advantages over PTC2 (Probabilistic Tree Creator):
- much simpler implementation (tree creation logic fits in 30 lines of code)
- produces trees of minimal depth (very well-balanced trees), no need to tune the maximum depth parameter
Like the PTC2, the BTC is able to follow a user-specified distribution of tree lengths as well as desired symbol frequencies.
The idea behind the algorithm is to keep expanding a "horizon" of open slots (expansion points) in breadth-wise fashion while keeping track of the difference to the target length.
Sample implementation:
public static ISymbolicExpressionTree CreateExpressionTree(IRandom random, ISymbolicExpressionGrammar grammar, int targetLength, int maxDepth) { var allowedSymbols = grammar.AllowedSymbols.Where(x => x.InitialFrequency > 0.0).ToList(); var tree = MakeStump(random, grammar); var tuples = new List<NodeInfo>(targetLength) { new NodeInfo { Node = tree.Root, Depth = 0, Arity = 1 }, new NodeInfo { Node = tree.Root.GetSubtree(0), Depth = 1, Arity = 1 } }; int remLength = targetLength - 2; // remaining length; I already have a root and start node int openSlots = 1; // startNode has arity 1 for (int i = 1; i < tuples.Count; ++i) { var t = tuples[i]; var node = t.Node; for (int j = 0; j < t.Arity; ++j) { // min and max arity here refer to the required arity limits for the child node int maxChildArity = t.Depth == maxDepth - 1 ? 0 : Math.Min(GetMaxChildArity(grammar, allowedSymbols, node.Symbol), remLength - openSlots); int minChildArity = Math.Min(1, maxChildArity); var child = SampleNode(random, grammar, allowedSymbols, node.Symbol, minChildArity, maxChildArity); var childArity = random.Next(grammar.GetMinimumSubtreeCount(child.Symbol), grammar.GetMaximumSubtreeCount(child.Symbol) + 1); var childDepth = t.Depth + 1; node.AddSubtree(child); tuples.Add(new NodeInfo { Node = child, Depth = childDepth, Arity = childArity }); remLength--; openSlots += childArity - 1; } } return tree; }
Preliminary test: BTC vs PTC2 with 100K trees with uniform length from U(1,100) and max depth 100, and a grammar G={+,-,*,/,exp,log,constant}:
- BTC:
Function distribution: ProgramRootSymbol: 3.22% StartSymbol: 3.22% Logarithm: 16.23% Division: 15.30% Exponential: 16.22% Subtraction: 15.29% Addition: 15.27% Multiplication: 15.25% Distribution of function arities: 01:00 23.66% 02:00 37.19% 00:00 39.15% Terminal distribution: Constant: 100.00% Average tree depth: 9 Average tree length: 51.02238 Total nodes created: 5102238 Average shape: 404.66003
- PTC2:
Function distribution: ProgramRootSymbol: 3.16% StartSymbol: 3.16% Exponential: 16.22% Division: 15.32% Logarithm: 16.25% Addition: 15.28% Multiplication: 15.29% Subtraction: 15.34% Distribution of function arities: 01:00 23.59% 02:00 37.24% 00:00 39.16% Terminal distribution: Constant: 100.00% Average tree depth: 13 Average tree length: 52.05025 Total nodes created: 5205025 Average shape: 507.90366
- The numbers above are nearly identical between BTC and PTC2
- The average shape calculated as the nested tree length averaged over all generated trees shows that BTC produces more balanced trees
- Performance wise, BTC with a rudimentary cache for grammar queries achieves a speed of ~7200 trees/s, while PTC2 achieves a speed of ~1600 trees/s.
BTC Tree Length Distribution