Opened 8 years ago
Closed 7 years ago
#2730 closed enhancement (done)
Add similarity calculators and equality comparers for encodings
Reported by: | abeham | Owned by: | abeham |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.15 |
Component: | Encodings | Version: | 3.3.14 |
Keywords: | Cc: |
Description (last modified by jkarder)
Commonly used encodings should have a similarity calculator for computation of genotypical distance as well as an equality comparer in order to track unique solutions. I would propose to add the following calculators:
- Hamming similarity calculator for (Binary|Integer|Real)VectorEncoding
- Euclidean similarity calculator for (Integer|Real)VectorEncoding
- Equality comparer for (Integer|Real)VectorEncoding
This ticket depends on #2706.
Change History (11)
comment:1 Changed 8 years ago by abeham
- Component changed from ### Undefined ### to Encodings
- Status changed from new to accepted
comment:2 Changed 8 years ago by abeham
- Owner changed from abeham to jkarder
- Status changed from accepted to reviewing
comment:3 Changed 8 years ago by gkronber
I'd prefer using the terms Hamming distance and Euclidian distance instead of similarity.
comment:4 Changed 8 years ago by abeham
If it would calculate the Hamming distance, I'd call it so. I could imagine calling it HammingDistanceBasedSimilarityCalculator, but I despise of these long names.
Regarding the use of Euclidean distance in similarity calculators I found out that we need to express similarity in the range [0;1], where 0 means maximum distance and 1 means no distance. Now, Euclidean distance doesn't feature a maximum distance. Practically, we do have bounds on the vector and could normalize the distance with respect to these (ignoring potential issues when these bounds are not respected or truncating them at the bounds), but similarity calculators are those strange classes that are not operators and thus don't have an ExecutionContext to look up parameters. In the end, I think it's more trouble than it's worth.
comment:5 Changed 8 years ago by gkronber
Maybe cosine similarity can be used.
comment:6 Changed 7 years ago by jkarder
- Owner changed from jkarder to abeham
- Status changed from reviewing to assigned
- similarity calculators
- linear linkage: a NullReferenceException is thrown if at least one of the linear linkages is null
- permutation: an IndexOutOfRangeException is thrown if relative permutations are of length 0
- all
- double.NaN is returned if both compared objects are of length 0
- within the static CalculateSimilarity methods, some exception messages state that "[...] one or both of the provided scopes is null.", whereas no scopes are used
- equality comparers
- binary vector: a NullReferenceException is thrown if at least one of the binary vectors is null
Thanks for implementing these.
I think we should refactor the similarity calculators at some point.
This ticket depends on #2706.
comment:7 Changed 7 years ago by jkarder
- Description modified (diff)
comment:8 Changed 7 years ago by abeham
- Owner changed from abeham to jkarder
- Status changed from assigned to reviewing
- Implemented review comments
- Unified implementation of all equality comparers and similarity calculators in BinaryVector, IntegerVector, RealVector, Permutation, and LinearLinkage encodings
- Added Euclidean distance-based similarity calculators for real and integer vectors using a transformation function with scaling parameter
I used as transformation function 1 / (1 + x) which was also mentioned here.
comment:9 Changed 7 years ago by jkarder
- Owner changed from jkarder to abeham
comment:10 Changed 7 years ago by abeham
- Status changed from reviewing to readytorelease
r15162: ok, thanks for fixing this
comment:11 Changed 7 years ago by abeham
- Resolution set to done
- Status changed from readytorelease to closed
r15217: merged revisions 14412, 14475, 14476, 14659, 14660, 14663, 14779, 14780, 14912, 15050, 15067, 15069, 15079, 15162, 15166, 15172, 15173 to stable
r14659:14660: Added similarity calculators and equality comparers, updated project files