Opened 6 weeks ago

Last modified 6 weeks ago

#2959 accepted defect

Issues with the bottom-up similarity calculator

Reported by: bburlacu Owned by: bburlacu
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Problems.DataAnalysis.Symbolic Version: 3.3.16
Keywords: Cc:

Description

The bottom-up tree distance was developed by G. Valiente as an efficient and more flexible alternative to the edit distance. The HL implementation SymbolicExpressionTreeBottomUpSimilarityCalculator follows in detail the algorithm described in the paper, but suffers currently from two issues:

1) The returned similarity incorrectly takes into account the ProgramRootSymbol and StartSymbol for each tree, even though these are always the same, leading to artificially inflated similarity measures

2) No option for loose matching of constants and variable weights

3) The paper suggests handling unordered trees (which is the case when we deal with commutative symbols) by sorting the list of graph nodes associated with the children of the two tree nodes being compared for equality. However, any kind of such sorting may deliver incorrect results in rare cases when a shallow comparison (eg by node label or similar property) does not produce the correct order when the children are same. In such cases when the children appear in different respective orders under the two parent nodes being compared sorting fails to produce a consistent ordering and the matching fails.

Change History (1)

comment:1 Changed 6 weeks ago by bburlacu

  • Status changed from new to accepted

r16278: Address the issues described in the ticket.

r16279: Simplify code, improve performance, fix another potential bug in relation with 3).

r16283: Refactor bottom-up similarity calculator methods (main logic stays the same, better implementation), update unit tests (more tests will be added).

Last edited 6 weeks ago by bburlacu (previous) (diff)
Note: See TracTickets for help on using tickets.