Opened 5 years ago

Closed 4 years ago

#1913 closed feature request (done)

Implement Neighborhood Components Analysis (NCA)

Reported by: abeham Owned by: abeham
Priority: medium Milestone: HeuristicLab 3.3.8
Component: Algorithms Version: 3.3.8
Keywords: Cc:

Description

It is a potential improvement on k nearest neighbor proposed by Goldberger et al. in 2004.

Change History (31)

comment:2 Changed 5 years ago by abeham

  • Status changed from new to accepted

comment:3 Changed 5 years ago by abeham

r8412: imported branch

comment:4 Changed 5 years ago by abeham

r8420:

  • Worked on NCA
  • Added scatter plot view for the model to show training data when it is reduced to two dimensions

It works, but I don't think it works correctly yet. I have randomized the initial matrix, because the starting point influences the achievable quality quite a bit.

comment:5 Changed 5 years ago by abeham

r8422:

  • Reduced memory footprint
  • Fixed some bugs
  • Added some tests

comment:6 Changed 5 years ago by abeham

r8424: corrected tests and fixed bugs

comment:7 Changed 5 years ago by abeham

r8425: Added several initialization methods (LDA, PCA, and Random)

comment:8 Changed 5 years ago by abeham

r8427:

  • Updated view
  • Cloned ProblemData for solution

comment:9 Changed 5 years ago by abeham

r8437:

  • Improved speed of NCA
  • Reorganized things

comment:10 Changed 5 years ago by abeham

r8441: added quality output

comment:11 Changed 5 years ago by abeham

r8454:

  • Refactored NCAModel and NeighborhoodComponentsAnalysis algorithm
  • Model now includes NearestNeighborModel
  • Algorithm has ability to be canceled (basically recreated the optimization loop of mincgoptimize)
  • Scaling should work properly now

comment:12 Changed 5 years ago by abeham

r8465: Changed k-NN to move model representation (kdTree) into the model object

comment:13 Changed 5 years ago by abeham

r8466:

  • Refactored classes
  • Added parameter for number of iterations
  • Added parameter for neighborhood sampling (allows to speed up gradient calculation)
  • Adapted to changed k-NN algorithm

comment:14 Changed 5 years ago by abeham

r8467: fixed name of model

comment:15 Changed 5 years ago by abeham

r8470: increased point size and converted K to a fixed value parameter

comment:16 Changed 5 years ago by abeham

r8471: integrated branch into trunk

comment:17 Changed 5 years ago by abeham

  • Owner changed from abeham to gkronber
  • Status changed from accepted to reviewing

r8472: removed branch

comment:18 follow-up: Changed 5 years ago by abeham

Just a note, I still want to use the classification penalties in the objective function. I don't think this would be hard to do, but it would probably change the gradient, so I'm waiting on your input on the current implementation first which is true to the paper.

Also I wanted to allow cancellation of the algorithm and output the current result (would only support "Stop" if this is possible). But I didn't find a way to get the CancellationToken for the algorithm. Is this unavailable for FixedDataAnalysisAlgorithms?

comment:19 Changed 5 years ago by abeham

[REVERTED] r8523: added classification penalty into fitness function and gradient

Last edited 5 years ago by abeham (previous) (diff)

comment:20 Changed 5 years ago by abeham

r8681: Added regularization term introduced by Yang and Laaksonen 2007.

I reverted r8523 and introduced a regularization term which subtracts the Frobenius norm of the transformation matrix from the fitness function (adds it in our case as we're minimizing). The parameter lambda (regularization) controls the strength of this penalization and may be set by the user. If it is set to 0 then the algorithm exhibits the original behavior described by Goldberger et al.

comment:21 in reply to: ↑ 18 Changed 4 years ago by gkronber

Replying to abeham:

Also I wanted to allow cancellation of the algorithm and output the current result (would only support "Stop" if this is possible). But I didn't find a way to get the CancellationToken for the algorithm. Is this unavailable for FixedDataAnalysisAlgorithms?

The best way to support cancellation would be by refactoring the algorithm into smaller operators and combining it with a gradient descent algorithm.

comment:22 Changed 4 years ago by gkronber

Review comments:

  • The changes in NearestNeighbourModel are OK
  • should we move the class Matrix into one of the base plugins (e.g., HeuristicLab.Common)?
  • In NcaModel line 92: shouldn't we use a matrix-multiplication here?
  • in NcaAlgorithm line 221: why do we not set a value when normalization <= 0.
  • Why does the algorithm use sparse matrix? Are the majority of the values of probabilities = 0?
  • I reviewed the Gradient() method (r8681) but have to admit that I'm not familiar enough with NCA to be sure that it is implemented correctly.

Overall, the implementation looks good. However, the algorithm is hard to understand even when reading the referenced paper. Therefore, it would be great if the implementation is tested using unit test cases where the results of our implementation (esp. of the Gradient() method) are compared with the results of a reference implementation (e.g., from Toolkit for Distance Metric Learning or from Dimensionality reduction toolbox)?

Last edited 4 years ago by gkronber (previous) (diff)

comment:23 Changed 4 years ago by gkronber

  • Owner changed from gkronber to abeham

comment:24 follow-up: Changed 4 years ago by gkronber

Further comments from reviewing the NcaInitializers:

  • In LdaInitializer line 47: this is effectively equivalent to PrepareInputMatrix()... use the existing method instead (also in the other PcaInitializer).

* RandomInitializer line 76: matrix[i] = random.NextDouble() / range[i / dimensions]; isn't i / dimensions always zero since both are ints?

Last edited 4 years ago by gkronber (previous) (diff)

comment:25 Changed 4 years ago by gkronber

  • Status changed from reviewing to assigned

comment:26 Changed 4 years ago by abeham

r9270: Changed NCA to use LM-BFGS optimization algorithm, added model/solution creators, added operator for gradient calculation

comment:27 in reply to: ↑ 24 Changed 4 years ago by abeham

  • Owner changed from abeham to gkronber
  • Status changed from assigned to reviewing

r9272: removed scaling of data

Replying to gkronber:

Further comments from reviewing the NcaInitializers:

  • In LdaInitializer line 47: this is effectively equivalent to PrepareInputMatrix()... use the existing method instead (also in the other PcaInitializer).

I simplified the initializers.

Replying to gkronber:

Review comments:

  • The changes in NearestNeighbourModel are OK
  • should we move the class Matrix into one of the base plugins (e.g., HeuristicLab.Common)?

Maybe, but it should be checked more thoroughly. I would definitely favor something that allows to do vector/matrix operations and interfaces with double[,], double[], and maybe also DoubleMatrix/DoubleArray. But I don't know if this implementation is the way to go.

  • In NcaModel line 92: shouldn't we use a matrix-multiplication here?

It is a matrix multiplication already ;-) The Matrix class unfortunately isn't flexible enough. The reduced dataset contains the projected dimensions plus the target (in the last column).

  • in NcaAlgorithm line 221: why do we not set a value when normalization <= 0.

The projected point is too far away from any other point, which means that we don't effectively have any neighbor for classification. This is because influence decreases exponentially with distance.

  • Why does the algorithm use sparse matrix? Are the majority of the values of probabilities = 0?

Because the matrix would be instances times instances which is potentially huge, and yes, most of the values are 0.

  • I reviewed the Gradient() method (r8681) but have to admit that I'm not familiar enough with NCA to be sure that it is implemented correctly.

Overall, the implementation looks good. However, the algorithm is hard to understand even when reading the referenced paper. Therefore, it would be great if the implementation is tested using unit test cases where the results of our implementation (esp. of the Gradient() method) are compared with the results of a reference implementation (e.g., from Toolkit for Distance Metric Learning or from Dimensionality reduction toolbox)?

I guess I need to install MatLab after all...

comment:28 Changed 4 years ago by gkronber

  • Status changed from reviewing to readytorelease

reviewed r9270 and r9272.

comment:29 Changed 4 years ago by abeham

  • Owner changed from gkronber to abeham
  • Status changed from readytorelease to assigned

There is a problem remaining

comment:30 Changed 4 years ago by abeham

  • Status changed from assigned to reviewing

r9451: fixed handling of non-sequential natural numbered class values in LDA initializer

comment:31 Changed 4 years ago by abeham

  • Status changed from reviewing to readytorelease

comment:32 Changed 4 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.7 to 3.3.8
Note: See TracTickets for help on using tickets.