Opened 3 weeks ago

Last modified 2 weeks ago

#3039 accepted feature request

BTC (Balanced Tree Creator) for HeuristicLab

Reported by: bburlacu Owned by: bburlacu
Priority: medium Milestone: HeuristicLab 3.3.17
Component: Encodings.SymbolicExpressionTreeEncoding Version: trunk
Keywords: genetic programming, tree initialization Cc:

Description (last modified by bburlacu)

I would like to propose a new tree creation algorithm with two main advantages over PTC2 (Probabilistic Tree Creator):

  • much simpler implementation (tree creation logic fits in 30 lines of code)
  • produces trees of minimal depth (very well-balanced trees), no need to tune the maximum depth parameter

Like the PTC2, the BTC is able to follow a user-specified distribution of tree lengths as well as desired symbol frequencies.

The idea behind the algorithm is to keep expanding a "horizon" of open slots (expansion points) in breadth-wise fashion while keeping track of the difference to the target length.

Sample implementation:

public static ISymbolicExpressionTree CreateExpressionTree(IRandom random, ISymbolicExpressionGrammar grammar, int targetLength, int maxDepth) {
  var allowedSymbols = grammar.AllowedSymbols.Where(x => x.InitialFrequency > 0.0).ToList();
  var tree = MakeStump(random, grammar);
  var tuples = new List<NodeInfo>(targetLength) {
    new NodeInfo { Node = tree.Root, Depth = 0, Arity = 1 },
    new NodeInfo { Node = tree.Root.GetSubtree(0), Depth = 1, Arity = 1 }
  };
  int remLength = targetLength - 2; // remaining length; I already have a root and start node
  int openSlots = 1; // startNode has arity 1

  for (int i = 1; i < tuples.Count; ++i) {
    var t = tuples[i];
    var node = t.Node;
    for (int j = 0; j < t.Arity; ++j) {
      // min and max arity here refer to the required arity limits for the child node
      int maxChildArity = t.Depth == maxDepth - 1 ? 0 : Math.Min(GetMaxChildArity(grammar, allowedSymbols, node.Symbol), remLength - openSlots);
      int minChildArity = Math.Min(1, maxChildArity);
      var child = SampleNode(random, grammar, allowedSymbols, node.Symbol, minChildArity, maxChildArity);
      var childArity = random.Next(grammar.GetMinimumSubtreeCount(child.Symbol), grammar.GetMaximumSubtreeCount(child.Symbol) + 1);
      var childDepth = t.Depth + 1;

      node.AddSubtree(child);
      tuples.Add(new NodeInfo { Node = child, Depth = childDepth, Arity = childArity });

      remLength--;
      openSlots += childArity - 1;
    }
  }
  return tree;
}  

Preliminary test: BTC vs PTC2 with 100K trees with uniform length from U(1,100) and max depth 100, and a grammar G={+,-,*,/,exp,log,constant}:

  • BTC:
    Function distribution:	
    ProgramRootSymbol:	3.22%
    StartSymbol:	3.22%
    Logarithm:	16.23%
    Division:	15.30%
    Exponential:	16.22%
    Subtraction:	15.29%
    Addition:	15.27%
    Multiplication:	15.25%
    
    Distribution of function arities:	
    01:00	23.66%
    02:00	37.19%
    00:00	39.15%
    
    Terminal distribution:
    Constant:	100.00%
    Average tree depth:	9
    Average tree length:	51.02238
    Total nodes created:	5102238
    Average shape:	404.66003
    
  • PTC2:
    Function distribution:	
    ProgramRootSymbol:	3.16%
    StartSymbol:	3.16%
    Exponential:	16.22%
    Division:	15.32%
    Logarithm:	16.25%
    Addition:	15.28%
    Multiplication:	15.29%
    Subtraction:	15.34%
    
    Distribution of function arities:	
    01:00	23.59%
    02:00	37.24%
    00:00	39.16%
    
    Terminal distribution:
    Constant:	100.00%
    Average tree depth:	13
    Average tree length:	52.05025
    Total nodes created:	5205025
    Average shape:	507.90366
    
  • The numbers above are nearly identical between BTC and PTC2
  • The average shape calculated as the nested tree length averaged over all generated trees shows that BTC produces more balanced trees
  • Performance wise, BTC with a rudimentary cache for grammar queries achieves a speed of ~7200 trees/s, while PTC2 achieves a speed of ~1600 trees/s.
  • Distribution of tree lengths:

BTC Tree Length Distribution PTC2 Tree Length Distribution

Attachments (3)

BTC.png (37.0 KB) - added by bburlacu 3 weeks ago.
BTC Tree Length Distribution
PTC2.png (37.3 KB) - added by bburlacu 3 weeks ago.
PTC2 Tree Length Distribution
tree_initialization_shape_vs_length.png (78.0 KB) - added by bburlacu 2 weeks ago.
Tree length vs shape

Download all attachments as: .zip

Change History (7)

Changed 3 weeks ago by bburlacu

BTC Tree Length Distribution

Changed 3 weeks ago by bburlacu

PTC2 Tree Length Distribution

comment:1 Changed 3 weeks ago by bburlacu

  • Description modified (diff)
  • Status changed from new to accepted

comment:2 Changed 3 weeks ago by bburlacu

r17344: Initial implementation.

TODO: Support ADF and fixing parts of the tree (or using predefined subtrees as symbols).

comment:3 Changed 3 weeks ago by bburlacu

r17345: Small bugfix.

comment:4 Changed 2 weeks ago by bburlacu

r17347: Enable construction of subtrees from an arbitrary root node. Introduce IrregularityBias parameter as a means to bias tree initialization towards less balanced/regular shapes.

Tree length vs shape

Last edited 2 weeks ago by bburlacu (previous) (diff)

Changed 2 weeks ago by bburlacu

Tree length vs shape

Note: See TracTickets for help on using tickets.