Opened 5 months ago

Closed 10 days ago

#2730 closed enhancement (done)

Add similarity calculators and equality comparers for encodings

Reported by: abeham Owned by: abeham
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Encodings Version: 3.3.14
Keywords: Cc:

Description (last modified by jkarder)

Commonly used encodings should have a similarity calculator for computation of genotypical distance as well as an equality comparer in order to track unique solutions. I would propose to add the following calculators:

  1. Hamming similarity calculator for (Binary|Integer|Real)VectorEncoding
  2. Euclidean similarity calculator for (Integer|Real)VectorEncoding
  3. Equality comparer for (Integer|Real)VectorEncoding

This ticket depends on #2706.

Change History (11)

comment:1 Changed 5 months ago by abeham

  • Component changed from ### Undefined ### to Encodings
  • Status changed from new to accepted

comment:2 Changed 5 months ago by abeham

  • Owner changed from abeham to jkarder
  • Status changed from accepted to reviewing

r14659:14660: Added similarity calculators and equality comparers, updated project files

comment:3 Changed 5 months ago by gkronber

I'd prefer using the terms Hamming distance and Euclidian distance instead of similarity.

comment:4 Changed 5 months ago by abeham

If it would calculate the Hamming distance, I'd call it so. I could imagine calling it HammingDistanceBasedSimilarityCalculator, but I despise of these long names.

Regarding the use of Euclidean distance in similarity calculators I found out that we need to express similarity in the range [0;1], where 0 means maximum distance and 1 means no distance. Now, Euclidean distance doesn't feature a maximum distance. Practically, we do have bounds on the vector and could normalize the distance with respect to these (ignoring potential issues when these bounds are not respected or truncating them at the bounds), but similarity calculators are those strange classes that are not operators and thus don't have an ExecutionContext to look up parameters. In the end, I think it's more trouble than it's worth.

comment:5 Changed 5 months ago by gkronber

Maybe cosine similarity can be used.

comment:6 Changed 5 weeks ago by jkarder

  • Owner changed from jkarder to abeham
  • Status changed from reviewing to assigned

Reviewed r14659 and r14660:

  • similarity calculators
    • linear linkage: a NullReferenceException is thrown if at least one of the linear linkages is null
    • permutation: an IndexOutOfRangeException is thrown if relative permutations are of length 0
    • all
      • double.NaN is returned if both compared objects are of length 0
      • within the static CalculateSimilarity methods, some exception messages state that "[...] one or both of the provided scopes is null.", whereas no scopes are used
  • equality comparers
    • binary vector: a NullReferenceException is thrown if at least one of the binary vectors is null

Thanks for implementing these.

I think we should refactor the similarity calculators at some point.

This ticket depends on #2706.

comment:7 Changed 5 weeks ago by jkarder

  • Description modified (diff)

comment:8 Changed 4 weeks ago by abeham

  • Owner changed from abeham to jkarder
  • Status changed from assigned to reviewing

r15067:

  • Implemented review comments
  • Unified implementation of all equality comparers and similarity calculators in BinaryVector, IntegerVector, RealVector, Permutation, and LinearLinkage encodings
  • Added Euclidean distance-based similarity calculators for real and integer vectors using a transformation function with scaling parameter

I used as transformation function 1 / (1 + x) which was also mentioned here.

Last edited 4 weeks ago by abeham (previous) (diff)

comment:9 Changed 2 weeks ago by jkarder

  • Owner changed from jkarder to abeham

Reviewed r15067 and made further changes, please check.

r15162: improved equality comparers

  • got rid of one extra comparison
  • fixed RealVectorEqualityComparer

comment:10 Changed 2 weeks ago by abeham

  • Status changed from reviewing to readytorelease

r15162: ok, thanks for fixing this

comment:11 Changed 10 days ago by abeham

  • Resolution set to done
  • Status changed from readytorelease to closed

r15217: merged revisions 14412, 14475, 14476, 14659, 14660, 14663, 14779, 14780, 14912, 15050, 15067, 15069, 15079, 15162, 15166, 15172, 15173 to stable

Note: See TracTickets for help on using tickets.