Opened 20 months ago

Closed 12 months ago

Last modified 12 months ago

#2959 closed defect (done)

Issues with the bottom-up similarity calculator

Reported by: bburlacu Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.16
Component: Problems.DataAnalysis.Symbolic Version: trunk
Keywords: merged Cc:

Description

The bottom-up tree distance was developed by G. Valiente as an efficient and more flexible alternative to the edit distance. The HL implementation SymbolicExpressionTreeBottomUpSimilarityCalculator follows in detail the algorithm described in the paper, but suffers currently from two issues:

1) The returned similarity incorrectly takes into account the ProgramRootSymbol and StartSymbol for each tree, even though these are always the same, leading to artificially inflated similarity measures

2) No option for loose matching of constants and variable weights

3) The paper suggests handling unordered trees (which is the case when we deal with commutative symbols) by sorting the list of graph nodes associated with the children of the two tree nodes being compared for equality. However, any kind of such sorting may deliver incorrect results in rare cases when a shallow comparison (eg by node label or similar property) does not produce the correct order when the children are same. In such cases when the children appear in different respective orders under the two parent nodes being compared sorting fails to produce a consistent ordering and the matching fails.

Change History (13)

comment:1 Changed 20 months ago by bburlacu

  • Status changed from new to accepted

r16278: Address the issues described in the ticket.

r16279: Simplify code, improve performance, fix another potential bug in relation with 3).

r16283: Refactor bottom-up similarity calculator methods (main logic stays the same, better implementation), update unit tests (more tests will be added).

Last edited 20 months ago by bburlacu (previous) (diff)

comment:2 Changed 19 months ago by gkronber

  • Version changed from 3.3.16 to trunk

Please complete your changes to the trunk until the end of the year.

comment:3 Changed 19 months ago by bburlacu

The similarity calculator is done. I need to add more unit tests and then this ticket will be ready for review.

comment:4 Changed 16 months ago by gkronber

Please add the unit tests.

comment:5 Changed 15 months ago by gkronber

Please make required changes and move to review phase.

comment:6 Changed 15 months ago by bburlacu

  • Owner changed from bburlacu to gkronber
  • Status changed from accepted to reviewing

r16867: Add more unit tests for tree matching, and test bottom-up calculator against hash-based calculator for consistency.

In my opinion, the bottom-up calculator is ready for review. Aside from added unit tests, the calculator successfully validates against the hash-based approach, producing exactly the same average similarity.

comment:7 Changed 15 months ago by bburlacu

r16869: Fix compilation errors.

comment:8 Changed 12 months ago by gkronber

Reviewed the changesets related to this ticket and made a small change with r17077.

comment:9 Changed 12 months ago by gkronber

  • Status changed from reviewing to readytorelease

comment:10 Changed 12 months ago by abeham

r17091: merged revisions 16278, 16279, 16283 to stable

remaining revisions after #2520: 16867, 16869

comment:11 Changed 12 months ago by abeham

  • Keywords depends-2520 added

comment:12 Changed 12 months ago by abeham

  • Keywords merged added; depends-2520 removed
  • Resolution set to done
  • Status changed from readytorelease to closed

r17143: merged 16867, 16869 to stable

comment:13 Changed 12 months ago by gkronber

r17160: merged r17077 from trunk to stable (missed while merging the ticket)

Note: See TracTickets for help on using tickets.