Opened 4 months ago

Last modified 3 months ago

#2730 reviewing enhancement

Add similarity calculators and equality comparers for encodings

Reported by: abeham Owned by: jkarder
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Encodings Version: 3.3.14
Keywords: Cc:


Commonly used encodings should have a similarity calculator for computation of genotypical distance as well as an equality comparer in order to track unique solutions. I would propose to add the following calculators:

  1. Hamming similarity calculator for (Binary|Integer|Real)VectorEncoding
  2. Euclidean similarity calculator for (Integer|Real)VectorEncoding
  3. Equality comparer for (Integer|Real)VectorEncoding

Change History (5)

comment:1 Changed 4 months ago by abeham

  • Component changed from ### Undefined ### to Encodings
  • Status changed from new to accepted

comment:2 Changed 4 months ago by abeham

  • Owner changed from abeham to jkarder
  • Status changed from accepted to reviewing

r14659:14660: Added similarity calculators and equality comparers, updated project files

comment:3 Changed 4 months ago by gkronber

I'd prefer using the terms Hamming distance and Euclidian distance instead of similarity.

comment:4 Changed 4 months ago by abeham

If it would calculate the Hamming distance, I'd call it so. I could imagine calling it HammingDistanceBasedSimilarityCalculator, but I despise of these long names.

Regarding the use of Euclidean distance in similarity calculators I found out that we need to express similarity in the range [0;1], where 0 means maximum distance and 1 means no distance. Now, Euclidean distance doesn't feature a maximum distance. Practically, we do have bounds on the vector and could normalize the distance with respect to these (ignoring potential issues when these bounds are not respected or truncating them at the bounds), but similarity calculators are those strange classes that are not operators and thus don't have an ExecutionContext to look up parameters. In the end, I think it's more trouble than it's worth.

comment:5 Changed 3 months ago by gkronber

Maybe cosine similarity can be used.

Note: See TracTickets for help on using tickets.