Opened 19 months ago

# BTC (Balanced Tree Creator) for HeuristicLab

Reported by: Owned by: bburlacu bburlacu medium HeuristicLab 3.3.17 Encodings.SymbolicExpressionTreeEncoding trunk genetic programming, tree initialization

I would like to propose a new tree creation algorithm with two main advantages over PTC2 (Probabilistic Tree Creator):

• much simpler implementation (tree creation logic fits in 30 lines of code)
• produces trees of minimal depth (very well-balanced trees), no need to tune the maximum depth parameter

Like the PTC2, the BTC is able to follow a user-specified distribution of tree lengths as well as desired symbol frequencies.

The idea behind the algorithm is to keep expanding a "horizon" of open slots (expansion points) in breadth-wise fashion while keeping track of the difference to the target length.

Sample implementation:

public static ISymbolicExpressionTree CreateExpressionTree(IRandom random, ISymbolicExpressionGrammar grammar, int targetLength, int maxDepth) {
var allowedSymbols = grammar.AllowedSymbols.Where(x => x.InitialFrequency > 0.0).ToList();
var tree = MakeStump(random, grammar);
var tuples = new List<NodeInfo>(targetLength) {
new NodeInfo { Node = tree.Root, Depth = 0, Arity = 1 },
new NodeInfo { Node = tree.Root.GetSubtree(0), Depth = 1, Arity = 1 }
};
int remLength = targetLength - 2; // remaining length; I already have a root and start node
int openSlots = 1; // startNode has arity 1

for (int i = 1; i < tuples.Count; ++i) {
var t = tuples[i];
var node = t.Node;
for (int j = 0; j < t.Arity; ++j) {
// min and max arity here refer to the required arity limits for the child node
int maxChildArity = t.Depth == maxDepth - 1 ? 0 : Math.Min(GetMaxChildArity(grammar, allowedSymbols, node.Symbol), remLength - openSlots);
int minChildArity = Math.Min(1, maxChildArity);
var child = SampleNode(random, grammar, allowedSymbols, node.Symbol, minChildArity, maxChildArity);
var childArity = random.Next(grammar.GetMinimumSubtreeCount(child.Symbol), grammar.GetMaximumSubtreeCount(child.Symbol) + 1);
var childDepth = t.Depth + 1;

tuples.Add(new NodeInfo { Node = child, Depth = childDepth, Arity = childArity });

remLength--;
openSlots += childArity - 1;
}
}
return tree;
}


Preliminary test: BTC vs PTC2 with 100K trees with uniform length from U(1,100) and max depth 100, and a grammar G={+,-,*,/,exp,log,constant}:

• BTC:
Function distribution:
ProgramRootSymbol:	3.22%
StartSymbol:	3.22%
Logarithm:	16.23%
Division:	15.30%
Exponential:	16.22%
Subtraction:	15.29%
Multiplication:	15.25%

Distribution of function arities:
01:00	23.66%
02:00	37.19%
00:00	39.15%

Terminal distribution:
Constant:	100.00%
Average tree depth:	9
Average tree length:	51.02238
Total nodes created:	5102238
Average shape:	404.66003

• PTC2:
Function distribution:
ProgramRootSymbol:	3.16%
StartSymbol:	3.16%
Exponential:	16.22%
Division:	15.32%
Logarithm:	16.25%
Multiplication:	15.29%
Subtraction:	15.34%

Distribution of function arities:
01:00	23.59%
02:00	37.24%
00:00	39.16%

Terminal distribution:
Constant:	100.00%
Average tree depth:	13
Average tree length:	52.05025
Total nodes created:	5205025
Average shape:	507.90366

• The numbers above are nearly identical between BTC and PTC2
• The average shape calculated as the nested tree length averaged over all generated trees shows that BTC produces more balanced trees
• Performance wise, BTC with a rudimentary cache for grammar queries achieves a speed of ~7200 trees/s, while PTC2 achieves a speed of ~1600 trees/s.
• Distribution of tree lengths:

### Changed 19 months ago by bburlacu

BTC Tree Length Distribution

### Changed 19 months ago by bburlacu

PTC2 Tree Length Distribution

### comment:1 Changed 19 months ago by bburlacu

• Description modified (diff)
• Status changed from new to accepted

### comment:2 Changed 19 months ago by bburlacu

r17344: Initial implementation.

TODO: Support ADF and fixing parts of the tree (or using predefined subtrees as symbols).

### comment:3 Changed 19 months ago by bburlacu

r17345: Small bugfix.

### comment:4 Changed 19 months ago by bburlacu

r17347: Enable construction of subtrees from an arbitrary root node. Introduce IrregularityBias parameter as a means to bias tree initialization towards less balanced/regular shapes.

Version 1, edited 19 months ago by bburlacu (previous) (next) (diff)

### Changed 19 months ago by bburlacu

Tree length vs shape

### comment:5 Changed 15 months ago by bburlacu

r17437: Refactor code and fix bug in sampling random child symbols from the grammar.

r17441: Fix compile error.

Last edited 15 months ago by bburlacu (previous) (diff)
Note: See TracTickets for help on using tickets.