Opened 12 years ago
Closed 12 years ago
#1943 closed feature request (done)
Add nearest neighbour model for symbolic classification
Reported by: | abeham | Owned by: | abeham |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.8 |
Component: | Problems.DataAnalysis.Symbolic.Classification | Version: | 3.3.8 |
Keywords: | Cc: |
Description
Instead of using thresholds to discretize the estimated values into classes one could also learn a nearest neighbour model on some of the estimated values and use it to classify the others.
Change History (13)
comment:1 Changed 12 years ago by abeham
- Status changed from new to accepted
comment:2 Changed 12 years ago by abeham
comment:3 Changed 12 years ago by abeham
- Owner changed from abeham to gkronber
- Status changed from accepted to reviewing
comment:4 Changed 12 years ago by gkronber
- Owner changed from gkronber to abeham
- Status changed from reviewing to assigned
review comments for r8606:
- a few comments in SymbolicNearestNeighbourClassificationModel.GetEstimatedClassValues() would be nice. I'm not sure I understand everything correctly...
- SymbolicNearestNeighbourClassificationModel ignores upper and lower estimation limits!
- Estimated values can be NaN or infinity. These are not handled correctly. Maybe this is not necessary, but it is suspicious that there is absolutely no code for this.
- SymbolicNearestNeighbourClassificationModel line 108: is it OK to simply use the first element of the enumerable if there are multiple entries with the same count? Should we use the most frequent class value in this case? The same thing happens in line 89.
- line 66-73: seems like we could also use a binary search here as trainedTargetPair is sorted.
- It seems some pairs of estimated & target values are lost in RecaluateValues is it not necessary to keep all pairs and later in GetEstimatedClassValues determine the most frequent target in the neighborhood?
- In the ModelCreator would it be better to use a larger default value for K than three?
Sorry, I have not yet understood the code fully. Please review my questions and adapt the code where necessary and then assign the ticket back to me. Thanks!
comment:5 Changed 12 years ago by abeham
- Status changed from assigned to accepted
comment:6 Changed 12 years ago by abeham
- Owner changed from abeham to gkronber
- Status changed from accepted to reviewing
- a few comments in SymbolicNearestNeighbourClassificationModel.GetEstimatedClassValues() would be nice.
- SymbolicNearestNeighbourClassificationModel line 108: is it OK to simply use the first element of the enumerable
- line 66-73: seems like we could also use a binary search here as trainedTargetPair is sorted
Comments have been added, code has been rewritten to use BinarySearch and return class with most samples first
- SymbolicNearestNeighbourClassificationModel ignores upper and lower estimation limits!
Added handling of estimation limits
- It seems some pairs of estimated & target values are lost in RecaluateValues
I don't know what you mean, please elaborate
- In the ModelCreator would it be better to use a larger default value for K than three?
I changed it to 11, but feel free to suggest a reasonable default value
comment:7 Changed 12 years ago by abeham
r8979: sealed class
comment:8 Changed 12 years ago by gkronber
- Owner changed from gkronber to abeham
- Status changed from reviewing to assigned
comment:9 Changed 12 years ago by abeham
- Status changed from assigned to accepted
comment:10 Changed 12 years ago by abeham
- Owner changed from abeham to gkronber
- Status changed from accepted to reviewing
r9002: modified nearest neighbor model to include number of samples at each estimated value
The last problem in the review was that the estimated values were flattened with a majority voting at every estimated value (which might contain many samples of different targets). The neighbors were then selected out of the flattened points which loses the information of how many samples have been at a certain value. In this change I keep the number of samples for an estimated value.
comment:11 Changed 12 years ago by gkronber
comment:12 Changed 12 years ago by abeham
- Owner changed from gkronber to abeham
- Status changed from reviewing to readytorelease
The code looks good and is readable, but I think it's a bit inefficient, memory-wise, to hold all estimated and training values. Nevertheless, I think it's not worthy to invest more time in this.
comment:13 Changed 12 years ago by swagner
- Resolution set to done
- Status changed from readytorelease to closed
- Version changed from 3.3.7 to 3.3.8
r8606: Added nearest neighbor model (+creator) for symbolic classification