Opened 7 weeks ago

Last modified 6 weeks ago

#2730 reviewing enhancement

Add similarity calculators and equality comparers for encodings

Reported by: abeham Owned by: jkarder
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Encodings Version: 3.3.14
Keywords: Cc:

Description

Commonly used encodings should have a similarity calculator for computation of genotypical distance as well as an equality comparer in order to track unique solutions. I would propose to add the following calculators:

  1. Hamming similarity calculator for (Binary|Integer|Real)VectorEncoding
  2. Euclidean similarity calculator for (Integer|Real)VectorEncoding
  3. Equality comparer for (Integer|Real)VectorEncoding

Change History (5)

comment:1 Changed 7 weeks ago by abeham

  • Component changed from ### Undefined ### to Encodings
  • Status changed from new to accepted

comment:2 Changed 7 weeks ago by abeham

  • Owner changed from abeham to jkarder
  • Status changed from accepted to reviewing

r14659:14660: Added similarity calculators and equality comparers, updated project files

comment:3 Changed 7 weeks ago by gkronber

I'd prefer using the terms Hamming distance and Euclidian distance instead of similarity.

comment:4 Changed 6 weeks ago by abeham

If it would calculate the Hamming distance, I'd call it so. I could imagine calling it HammingDistanceBasedSimilarityCalculator, but I despise of these long names.

Regarding the use of Euclidean distance in similarity calculators I found out that we need to express similarity in the range [0;1], where 0 means maximum distance and 1 means no distance. Now, Euclidean distance doesn't feature a maximum distance. Practically, we do have bounds on the vector and could normalize the distance with respect to these (ignoring potential issues when these bounds are not respected or truncating them at the bounds), but similarity calculators are those strange classes that are not operators and thus don't have an ExecutionContext to look up parameters. In the end, I think it's more trouble than it's worth.

comment:5 Changed 6 weeks ago by gkronber

Maybe cosine similarity can be used.

Note: See TracTickets for help on using tickets.