Opened 5 months ago

Last modified 7 days ago

#2700 reviewing feature request

t-Distributed Stochastic Neighbor Embedding

Reported by: bwerth Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Algorithms.DataAnalysis Version: branch
Keywords: Cc:

Description

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets and should be avialable in HeuristicLab

Change History (19)

comment:1 Changed 5 months ago by bwerth

  • Owner set to bwerth
  • Status changed from new to assigned

comment:2 Changed 5 months ago by bwerth

  • Type changed from defect to feature request

comment:3 Changed 5 months ago by bwerth

  • Status changed from assigned to accepted
  • Version changed from 3.3.14 to branch

comment:4 Changed 5 months ago by bwerth

r14387 created initial branch

comment:5 Changed 4 months ago by bwerth

r14413 refactored c++ style code to C# (use of [,] arrays, int vs uint,..) + corrected IterationsCounter

Last edited 4 months ago by bwerth (previous) (diff)

comment:6 Changed 4 months ago by bwerth

r14414 forgot to add files

comment:7 Changed 4 months ago by bwerth

  • Owner changed from bwerth to gkronber
  • Status changed from accepted to reviewing

comment:8 Changed 3 months ago by gkronber

Comments from an initial review:

  • It should be possible to stop the algorithm at any time
  • A quality line chart for the error would probably be interesting
  • It would be nice to be able to view the projection after each iteration
  • The descriptions for parameters should contain information on default settings or useful settings for the parameters.
  • Is it necessary to tune all the parameters for learning? Or would it also be ok to just use some robust default settings and hide most of the parameters (except for perplexity)
  • I think it is not strictly necessary that TSNE derives from Item (since it is probably never used directly in the GUI)
  • Error message: "Perplexity should be lower than K" what's K?

Let's discuss this in person...

Last edited 7 days ago by gkronber (previous) (diff)

comment:9 Changed 3 months ago by gkronber

  • Owner changed from gkronber to bwerth
  • Status changed from reviewing to assigned

comment:10 Changed 3 months ago by gkronber

r14503: minor change while reviewing

comment:11 Changed 3 months ago by bwerth

r14512 worked in several comments from mkommend, extended analysis during algorithm run, added more Distances, made algorithm stoppable

comment:12 Changed 3 months ago by gkronber

More observations:

  • TSNE should be a BasicAlgorithm
  • Exception when switching between views (projected data & quality line chart) while the algorithm is running
  • r14512 added references to files for a kernel PCA in the project file (please remove).
  • Why does the error change abruptly when the 'stop-lying-iteration' is reached? (--> OK)
  • Hide parameters: *Momentum, Eta, MomentumSwitch, StopLying. Set StopLying to zero per default.
Last edited 7 days ago by gkronber (previous) (diff)

comment:13 Changed 3 months ago by bwerth

r14518 TSNEAnalysis is now a BasicAlg, hid Parameters, added optional data normalization to make TSNE scaling-invariant

comment:14 Changed 2 months ago by bwerth

r14558 made TSNE compatible with the new pausible BasicAlgs, removed rescaling of scatterplots during alg to give it a more movie-esque feel

comment:15 Changed 5 weeks ago by abeham

r14682: fixed references for alglib and libsvm

comment:16 Changed 2 weeks ago by bwerth

  • Owner changed from bwerth to gkronber
  • Status changed from assigned to reviewing

r14742 fixed displaying of randomly generated seed and some minor code simplifications

comment:17 Changed 7 days ago by gkronber

The 'performance-improved' distance methods which also accept a threshold seem to be implemented incorrectly. However, they are not used by tSNE anyway so I'm removing them.

 sum = 0;
 ...
 while(sum > threshold ...) {
   sum += ...
 }
 return sum;

comment:18 Changed 7 days ago by gkronber

VPTree contains a TODO item TODO check if minheap or maxheap should be used here

comment:19 Changed 7 days ago by gkronber

r14767: made some changes while reviewing the code

Note: See TracTickets for help on using tickets.