Goal is to cluster 10 Mio. functions with ~ 100 samples.

Hierarchical clustering (agglomative clustering) seems useful.

Approximate hierarchical clustering methods:
  - Happieclust: terminates with runtime exceptions
  - Twistertries: need to implement a small Java program to test this.


We could implement our own clustering if we have a fast method for finding nearest neighbours.
Approximate nearest neighbours:
 - Benchmarks with many current techniques: https://github.com/erikbern/ann-benchmarks
 - Annoy (Spotify) https://github.com/spotify/annoy
 - https://github.com/FALCONN-LIB/FALCONN
 - Fastest in benchmarks: https://github.com/searchivarius/nmslib