Opened 9 years ago
Closed 8 years ago
#2215 closed enhancement (done)
More accurate tree similarity measure based on tree bottom-up distance
Reported by: | bburlacu | Owned by: | mkommend |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.11 |
Component: | Problems.DataAnalysis.Symbolic | Version: | 3.3.10 |
Keywords: | Cc: |
Description (last modified by bburlacu)
Currently we calculate the similarity measure between two trees T1 and T2 based on the size of their largest common subtree:
similarity = 2 * |MaxCommonSubtree| / (|T1| + |T2|)
A better similarity measure is the one based on the bottom-up tree distance as described in [1]. This should be implemented in HeuristicLab.
Consider the following image, in which similar subtrees are marked with red and dissimilar subtrees with blue.
In the example above, the size of the maximum common subtree is 5, leading to a similarity value of 2 * 5 / (32 + 34) = 0.15. Bottom-up matching of the two trees results in a mapping of 15 nodes, leading to a similarity value of (2 * 15) / (32 + 34) = 0.45. Therefore, the bottom-up distance can provide a more accurate measure.
[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.7958
Attachments (1)
Change History (51)
Changed 9 years ago by bburlacu
comment:1 Changed 9 years ago by bburlacu
- Description modified (diff)
- Status changed from new to assigned
comment:2 Changed 9 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
r11211: Added implementation of the BottomUpTreeDistanceCalculator in branches/HeuristicLab.BottomUpTreeDistance. Will probably need to be cleaned up a bit, but seems to work fine. The algorithm presented in the paper requires common vertices in the compacted graph representation to have the same height. Removed that restriction as it seems unnecessary (the compacted DAG should not be separated by levels).
comment:3 Changed 9 years ago by bburlacu
r11212" Realized that "Height" in the paper might actually refer to the subtree depth (and not it's level in the tree). Changed code according to this interpretation as it makes more sense and seems to produce the correct results.
comment:4 Changed 9 years ago by bburlacu
r11213: Removed missing/unneeded files from project file.
comment:5 Changed 9 years ago by bburlacu
r11215: Code clean-up and small performance improvements.
comment:6 Changed 9 years ago by bburlacu
r11219: Refactored the tree distance calculators as similarity calculators (extending SingleObjectiveSolutionSimilarityCalculator). Removed ISymbolicExpressionTreeDistanceCalculator interface. Made small performance enhancements to the BottomUpSimilarityCalculator. Added unit tests to check correctness and performance of bottom up similarity. Added SingleObjectivePopulationDiversityAnalyzer in the default operators list along with the BottomUpSimilarityCalculator.
comment:7 Changed 9 years ago by bburlacu
r11220: Added a couple more test cases in the matching unit test.
comment:8 Changed 9 years ago by bburlacu
r11221: Fixed incorrect namespace of the BottomUpSimilarityCalculator. Changed signature of ComputeBottomMapping method to take tree nodes as arguments rather than trees, because we should be able to compute the bottom-up distance for any two subtrees. Added internal diversity analyzer based on the bottom-up distance, which computes the average diversity of all the nodes inside a tree individual (basically, it gives a measure of how many subtrees are identical on average inside a tree).
comment:9 Changed 9 years ago by bburlacu
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
comment:10 Changed 9 years ago by bburlacu
- Status changed from assigned to accepted
comment:11 Changed 9 years ago by bburlacu
r11224: Added checks for debugging purposes in the BottomUpSimilarityCalculator and refactored the SymbolicDataAnalysisInternalDiversityAnalyzer.
comment:12 Changed 9 years ago by bburlacu
r11225: Fixed a couple of bugs in the BottomUpSimilarityCalculator:
- the child sequences of matching nodes were not sorted in the same way (one was using StringComparison.Ordinal, the other the default)
- when creating the mapping it is necessary to walk the nodes of the first tree in decreasing height order (not level-order as the author claims)
Added more elaborate test case in the unit test based on an actual case where the method used to fail (now it works correctly).
comment:13 Changed 9 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from accepted to reviewing
comment:14 Changed 9 years ago by bburlacu
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
comment:15 Changed 9 years ago by bburlacu
r11229: Refactored and simplified DirectedGraph and related components API, simplified the BottomUpSimilarityCalculator by not using a directed graph and vertices but a simpler object so that the similarity calculator is self-contained.
comment:16 Changed 9 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
comment:17 Changed 9 years ago by bburlacu
r11230: Fixed bug in MaxCommonSubtreeSimilarityCalculator and added performance unit test.
comment:18 Changed 9 years ago by bburlacu
r11234: Updated tests project. Fixed a couple of bugs when adding arcs to a vertex. Removed useless events from the directed graph. Worked around invalid operation exception when removing arcs.
comment:19 Changed 9 years ago by mkommend
Reviewing comments:
IDirectedGraph
- Missing event handler vertex added /removed, arc added / removed
DirectedGraph
- Empty methods OnVertexChanged, OnArcChanged should be removed
- Line 101: ToList call is unnecessary
- Unnecessary casts to specific implementation Vertex in Add and RemoveArc.
Arc
- In HeuristicLab class events are commonly defined at the end of the class, or at least after the ctors.
- Changed events should only be fired if the property is changed (missing if clause, e.g. if(value != weight))
Vertex
- In HeuristicLab class events are commonly defined at the end of the class, or at least after the ctors.
- Changed events should only be fired if the property is changed (missing if clause, e.g. if(value != weight))
- private default ctor is unnecessary
- RemoveArc(vertex): never catch exception, missing null check after usage of SinlgeOrDefault, IMHO this method should be removed and only RemoveArc(arc) should be available.
- Properties should be kept together (e.g., degrees should not be at the end of the class)
comment:20 Changed 9 years ago by bburlacu
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
comment:21 Changed 9 years ago by bburlacu
r11238: Updated interfaces and graph components according to the reviewer comments (IDirectedGraph, DirectedGraph, Vertex, Arc).
comment:22 Changed 9 years ago by bburlacu
- Renamed BottomUpSimilarityCalculator to BottomUpTreeSimilarityCalculator.
- Refactored the BottomUpTreeSimilarityCalculator to accept a configurable list of commutative symbols (the children of commutative symbols need to be sorted according to their label).
- Added MaxCommonSubtreeSimilarityCalculator performance test
- Updated BottomUpTreeSimilarityCalculator performance test
comment:23 Changed 9 years ago by mkommend
r11243: Removed directed graph implementation from the bottom up tree distance branch.
comment:24 Changed 9 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
r11244: Updated test traits in the bottom up and max common subtree similarity tests.
comment:25 Changed 9 years ago by bburlacu
- Owner changed from mkommend to abeham
comment:26 Changed 9 years ago by bburlacu
- Owner changed from abeham to bburlacu
- Status changed from reviewing to assigned
comment:27 Changed 9 years ago by bburlacu
r11254: Do not add the BottomUpSimilarityCalculator to the problem operators collection. Add it directly to the diversity analyzer.
comment:28 Changed 8 years ago by bburlacu
r11486: Renamed BottomUpTreeSimilarityCalculator to BottomUpSimilarityCalculator, improved performance by 10% by using the SymbolicExpressionTreeNodeComparer for ordering nodes (instead of string.Compare on node.ToString()). Updated the rest of the files accordingly.
comment:29 Changed 8 years ago by bburlacu
r11487: Further improved performance of the BottomUpSimilarityCalculator (100% improvement).
comment:30 Changed 8 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
comment:31 Changed 8 years ago by mkommend
- Version changed from 3.3.10 to branch
comment:32 Changed 8 years ago by mkommend
r11887: Merged trunk changes into bottom up similarity calculator branch.
comment:33 Changed 8 years ago by bburlacu
r11888: MaxCommonSubtreeSimilarityCalculator: added properties for matching settings, added storable constructor, fixed bug in similarity calculation.
comment:34 Changed 8 years ago by bburlacu
r11889: Set matching properties when cloning in the BottomUpSimilarityCalculator. Always match variable names in the MaxCommonSubtreeSimilarityCalculator (remove property), add license header to SymbolicExpressionTreeNodeSimilarityComparer.cs.
comment:35 Changed 8 years ago by bburlacu
r11894: Performance optimizations (up to 25% faster).
comment:36 Changed 8 years ago by mkommend
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
comment:37 Changed 8 years ago by bburlacu
- Unified the similarity and matching/equality classes under the same folder.
- Renamed SymbolicExpressionTreeNodeSimilarityComparer to SymbolicExpressionTreeNodeEqualityComparer, renamed other classes to more descriptive names.
- Removed unused classes (SymbolicDataAnalysisInternalDiversityAnalyzer.cs, SymbolicExpressionTreeMaxCommonSequenceCalculator.cs
- Renamed tests and test files.
comment:38 Changed 8 years ago by bburlacu
r11911: Updated tests project file.
comment:39 Changed 8 years ago by bburlacu
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
comment:40 Changed 8 years ago by mkommend
- Owner changed from mkommend to bburlacu
- Status changed from reviewing to assigned
r11915: Merged changes in bottom up similarity calculator branch in the trunk.
comment:41 Changed 8 years ago by mkommend
Please move the unit tests in the trunk and afterwards delete the according branch.
comment:42 Changed 8 years ago by bburlacu
r11916: Merged unit tests into the trunk.
comment:43 Changed 8 years ago by bburlacu
r11917: Deleted branch.
comment:44 Changed 8 years ago by bburlacu
r11918: Added missing storable constructor to SymbolicExpressionTreeBottomUpSimilarityCalculator.
comment:45 Changed 8 years ago by mkommend
r11921: Corrected access modifier of storable ctor in bottom up similarity calculator.
comment:46 Changed 8 years ago by mkommend
- Owner changed from bburlacu to mkommend
- Status changed from assigned to reviewing
comment:47 Changed 8 years ago by mkommend
- Status changed from reviewing to readytorelease
comment:48 Changed 8 years ago by jkarder
- Version changed from branch to 3.3.10
comment:49 Changed 8 years ago by jkarder
r11934: removed test attributes from TestMatchedNodes since it is not actually a test method
comment:50 Changed 8 years ago by jkarder
- Resolution set to done
- Status changed from readytorelease to closed
r11947: merged r11915:11918, r11921 and r11934 into stable
Bottom-up mapping between two trees