Goal is to cluster 10 Mio. functions with ~ 100 samples. Hierarchical clustering (agglomative clustering) seems useful. Approximate hierarchical clustering methods: - Happieclust: terminates with runtime exceptions - Twistertries: need to implement a small Java program to test this. We could implement our own clustering if we have a fast method for finding nearest neighbours. Approximate nearest neighbours: - Benchmarks with many current techniques: https://github.com/erikbern/ann-benchmarks - Annoy (Spotify) https://github.com/spotify/annoy - https://github.com/FALCONN-LIB/FALCONN - Fastest in benchmarks: https://github.com/searchivarius/nmslib