Opened 3 years ago

Closed 2 years ago

#2215 closed enhancement (done)

More accurate tree similarity measure based on tree bottom-up distance

Reported by: bburlacu Owned by: mkommend
Priority: medium Milestone: HeuristicLab 3.3.11
Component: Problems.DataAnalysis.Symbolic Version: 3.3.10
Keywords: Cc:

Description (last modified by bburlacu)

Currently we calculate the similarity measure between two trees T1 and T2 based on the size of their largest common subtree:

similarity = 2 * |MaxCommonSubtree| / (|T1| + |T2|)

A better similarity measure is the one based on the bottom-up tree distance as described in [1]. This should be implemented in HeuristicLab.

Consider the following image, in which similar subtrees are marked with red and dissimilar subtrees with blue.

Bottom-up mapping between two trees

In the example above, the size of the maximum common subtree is 5, leading to a similarity value of 2 * 5 / (32 + 34) = 0.15. Bottom-up matching of the two trees results in a mapping of 15 nodes, leading to a similarity value of (2 * 15) / (32 + 34) = 0.45. Therefore, the bottom-up distance can provide a more accurate measure.

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.7958

Attachments (1)

map.png (135.0 KB) - added by bburlacu 3 years ago.
Bottom-up mapping between two trees

Download all attachments as: .zip

Change History (51)

Changed 3 years ago by bburlacu

Bottom-up mapping between two trees

comment:1 Changed 3 years ago by bburlacu

  • Description modified (diff)
  • Status changed from new to assigned

comment:2 Changed 3 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

r11211: Added implementation of the BottomUpTreeDistanceCalculator in branches/HeuristicLab.BottomUpTreeDistance. Will probably need to be cleaned up a bit, but seems to work fine. The algorithm presented in the paper requires common vertices in the compacted graph representation to have the same height. Removed that restriction as it seems unnecessary (the compacted DAG should not be separated by levels).

comment:3 Changed 3 years ago by bburlacu

r11212" Realized that "Height" in the paper might actually refer to the subtree depth (and not it's level in the tree). Changed code according to this interpretation as it makes more sense and seems to produce the correct results.

comment:4 Changed 3 years ago by bburlacu

r11213: Removed missing/unneeded files from project file.

comment:5 Changed 3 years ago by bburlacu

r11215: Code clean-up and small performance improvements.

comment:6 Changed 3 years ago by bburlacu

r11219: Refactored the tree distance calculators as similarity calculators (extending SingleObjectiveSolutionSimilarityCalculator). Removed ISymbolicExpressionTreeDistanceCalculator interface. Made small performance enhancements to the BottomUpSimilarityCalculator. Added unit tests to check correctness and performance of bottom up similarity. Added SingleObjectivePopulationDiversityAnalyzer in the default operators list along with the BottomUpSimilarityCalculator.

comment:7 Changed 3 years ago by bburlacu

r11220: Added a couple more test cases in the matching unit test.

comment:8 Changed 3 years ago by bburlacu

r11221: Fixed incorrect namespace of the BottomUpSimilarityCalculator. Changed signature of ComputeBottomMapping method to take tree nodes as arguments rather than trees, because we should be able to compute the bottom-up distance for any two subtrees. Added internal diversity analyzer based on the bottom-up distance, which computes the average diversity of all the nodes inside a tree individual (basically, it gives a measure of how many subtrees are identical on average inside a tree).

Last edited 3 years ago by bburlacu (previous) (diff)

comment:9 Changed 3 years ago by bburlacu

  • Owner changed from mkommend to bburlacu
  • Status changed from reviewing to assigned

comment:10 Changed 3 years ago by bburlacu

  • Status changed from assigned to accepted

comment:11 Changed 3 years ago by bburlacu

r11224: Added checks for debugging purposes in the BottomUpSimilarityCalculator and refactored the SymbolicDataAnalysisInternalDiversityAnalyzer.

comment:12 Changed 3 years ago by bburlacu

r11225: Fixed a couple of bugs in the BottomUpSimilarityCalculator:

  • the child sequences of matching nodes were not sorted in the same way (one was using StringComparison.Ordinal, the other the default)
  • when creating the mapping it is necessary to walk the nodes of the first tree in decreasing height order (not level-order as the author claims)

Added more elaborate test case in the unit test based on an actual case where the method used to fail (now it works correctly).

comment:13 Changed 3 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from accepted to reviewing

comment:14 Changed 3 years ago by bburlacu

  • Owner changed from mkommend to bburlacu
  • Status changed from reviewing to assigned

comment:15 Changed 3 years ago by bburlacu

r11229: Refactored and simplified DirectedGraph and related components API, simplified the BottomUpSimilarityCalculator by not using a directed graph and vertices but a simpler object so that the similarity calculator is self-contained.

comment:16 Changed 3 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

comment:17 Changed 3 years ago by bburlacu

r11230: Fixed bug in MaxCommonSubtreeSimilarityCalculator and added performance unit test.

comment:18 Changed 3 years ago by bburlacu

r11234: Updated tests project. Fixed a couple of bugs when adding arcs to a vertex. Removed useless events from the directed graph. Worked around invalid operation exception when removing arcs.

comment:19 Changed 3 years ago by mkommend

Reviewing comments:

IDirectedGraph

  • Missing event handler vertex added /removed, arc added / removed

DirectedGraph

  • Empty methods OnVertexChanged, OnArcChanged should be removed
  • Line 101: ToList call is unnecessary
  • Unnecessary casts to specific implementation Vertex in Add and RemoveArc.

Arc

  • In HeuristicLab class events are commonly defined at the end of the class, or at least after the ctors.
  • Changed events should only be fired if the property is changed (missing if clause, e.g. if(value != weight))

Vertex

  • In HeuristicLab class events are commonly defined at the end of the class, or at least after the ctors.
  • Changed events should only be fired if the property is changed (missing if clause, e.g. if(value != weight))
  • private default ctor is unnecessary
  • RemoveArc(vertex): never catch exception, missing null check after usage of SinlgeOrDefault, IMHO this method should be removed and only RemoveArc(arc) should be available.
  • Properties should be kept together (e.g., degrees should not be at the end of the class)

comment:20 Changed 3 years ago by bburlacu

  • Owner changed from mkommend to bburlacu
  • Status changed from reviewing to assigned

comment:21 Changed 3 years ago by bburlacu

r11238: Updated interfaces and graph components according to the reviewer comments (IDirectedGraph, DirectedGraph, Vertex, Arc).

comment:22 Changed 3 years ago by bburlacu

r11239:

  • Renamed BottomUpSimilarityCalculator to BottomUpTreeSimilarityCalculator.
  • Refactored the BottomUpTreeSimilarityCalculator to accept a configurable list of commutative symbols (the children of commutative symbols need to be sorted according to their label).
  • Added MaxCommonSubtreeSimilarityCalculator performance test
  • Updated BottomUpTreeSimilarityCalculator performance test

comment:23 Changed 3 years ago by mkommend

r11243: Removed directed graph implementation from the bottom up tree distance branch.

comment:24 Changed 3 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

r11244: Updated test traits in the bottom up and max common subtree similarity tests.

comment:25 Changed 3 years ago by bburlacu

  • Owner changed from mkommend to abeham

comment:26 Changed 3 years ago by bburlacu

  • Owner changed from abeham to bburlacu
  • Status changed from reviewing to assigned

comment:27 Changed 3 years ago by bburlacu

r11254: Do not add the BottomUpSimilarityCalculator to the problem operators collection. Add it directly to the diversity analyzer.

comment:28 Changed 3 years ago by bburlacu

r11486: Renamed BottomUpTreeSimilarityCalculator to BottomUpSimilarityCalculator, improved performance by 10% by using the SymbolicExpressionTreeNodeComparer for ordering nodes (instead of string.Compare on node.ToString()). Updated the rest of the files accordingly.

comment:29 Changed 3 years ago by bburlacu

r11487: Further improved performance of the BottomUpSimilarityCalculator (100% improvement).

Last edited 3 years ago by bburlacu (previous) (diff)

comment:30 Changed 3 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

comment:31 Changed 3 years ago by mkommend

  • Version changed from 3.3.10 to branch

comment:32 Changed 2 years ago by mkommend

r11887: Merged trunk changes into bottom up similarity calculator branch.

comment:33 Changed 2 years ago by bburlacu

r11888: MaxCommonSubtreeSimilarityCalculator: added properties for matching settings, added storable constructor, fixed bug in similarity calculation.

comment:34 Changed 2 years ago by bburlacu

r11889: Set matching properties when cloning in the BottomUpSimilarityCalculator. Always match variable names in the MaxCommonSubtreeSimilarityCalculator (remove property), add license header to SymbolicExpressionTreeNodeSimilarityComparer.cs.

comment:35 Changed 2 years ago by bburlacu

r11894: Performance optimizations (up to 25% faster).

comment:36 Changed 2 years ago by mkommend

  • Owner changed from mkommend to bburlacu
  • Status changed from reviewing to assigned

comment:37 Changed 2 years ago by bburlacu

r11910:

  • Unified the similarity and matching/equality classes under the same folder.
  • Renamed SymbolicExpressionTreeNodeSimilarityComparer to SymbolicExpressionTreeNodeEqualityComparer, renamed other classes to more descriptive names.
  • Removed unused classes (SymbolicDataAnalysisInternalDiversityAnalyzer.cs, SymbolicExpressionTreeMaxCommonSequenceCalculator.cs
  • Renamed tests and test files.

comment:38 Changed 2 years ago by bburlacu

r11911: Updated tests project file.

comment:39 Changed 2 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

comment:40 Changed 2 years ago by mkommend

  • Owner changed from mkommend to bburlacu
  • Status changed from reviewing to assigned

r11915: Merged changes in bottom up similarity calculator branch in the trunk.

comment:41 Changed 2 years ago by mkommend

Please move the unit tests in the trunk and afterwards delete the according branch.

comment:42 Changed 2 years ago by bburlacu

r11916: Merged unit tests into the trunk.

comment:43 Changed 2 years ago by bburlacu

r11917: Deleted branch.

comment:44 Changed 2 years ago by bburlacu

r11918: Added missing storable constructor to SymbolicExpressionTreeBottomUpSimilarityCalculator.

comment:45 Changed 2 years ago by mkommend

r11921: Corrected access modifier of storable ctor in bottom up similarity calculator.

comment:46 Changed 2 years ago by mkommend

  • Owner changed from bburlacu to mkommend
  • Status changed from assigned to reviewing

comment:47 Changed 2 years ago by mkommend

  • Status changed from reviewing to readytorelease

comment:48 Changed 2 years ago by jkarder

  • Version changed from branch to 3.3.10

comment:49 Changed 2 years ago by jkarder

r11934: removed test attributes from TestMatchedNodes since it is not actually a test method

comment:50 Changed 2 years ago by jkarder

  • Resolution set to done
  • Status changed from readytorelease to closed

r11947: merged r11915:11918, r11921 and r11934 into stable

Note: See TracTickets for help on using tickets.