Opened 17 months ago

Last modified 9 months ago

#2886 accepted feature request

Implement Naive Grammar Enumeration for Symb. Regression

Reported by: lkammere Owned by: lkammere
Priority: medium Milestone:
Component: Algorithms.DataAnalysis Version: branch
Keywords: Cc:

Description

As a first step in deterministic symbolic regression, implement an algorithm that iterates and checks all possible sentences of a grammar in order to find the best model structure for (very simple) regression problems.

Change History (50)

comment:1 Changed 17 months ago by lkammere

  • Status changed from new to accepted

comment:2 Changed 17 months ago by lkammere

r15712: Add basic class structure, grammar and grammar iteration.

r15714: Add tree hashing for addition and multiplication.

r15722: Add evaluation of sentences.

r15723: Add simple data analysis tests and further informations about algorithm run.

r15724: Add parsing to infix form for debugging purpose.

r15725: Refactor tree hash function.

r15726: Overwrite long sentences when a shorter one with same hash was found.

r15734: worked on grammar enumeration

r15746: Refactor grammar enumeration alg.

r15765: Add graphviz output.

comment:3 Changed 17 months ago by lkammere

r15772: Extend grammar enumeration algorithm's grammar to exp, log and sine.

r15773: Update unit tests to cover problems with exp, log and sine.

comment:4 Changed 17 months ago by lkammere

r15776: Refactor data generation in unit tests.

comment:5 Changed 17 months ago by lkammere

r15784: Add basic implementation for inverse factors.

comment:6 Changed 16 months ago by gkronber

r15806: made a few comments

comment:7 Changed 16 months ago by lkammere

r15791: Replace integer hashing of phrases with simplification to (temporary) string representations.

r15795: Remove nested divisions from grammar and hashing.

r15800: Refactor code and fix performance issues.

r15803: Deactivate generation of dot file for visualizing search tree.

r15812: Performance Improvements - Only store hash of archived phrases and reduce number of enumerators.

comment:8 Changed 16 months ago by lkammere

r15817: Add cosine to grammar.

comment:9 Changed 16 months ago by lkammere

r15821: Move code for visualization and logging of sentences to separate classes.

r15823: Fixed build settings.

r15824: Move R² calculation of sentences to separate class and allow its deactivation.

comment:10 Changed 16 months ago by lkammere

r15825: Added prebuild event.

r15827: Change implementation of symbol strings from list to array.

r15828: Implement IEquatable interface in symbols. Minor performance improvements.

comment:11 Changed 16 months ago by lkammere

r15832: Fix Equals methods in Symbols, Move semantical hashing of phrases to separate class.

r15834: Store production rules in grammar instead of nonterminal symbols.

r15835: Split huge hashing function into smaller ones.

comment:12 Changed 16 months ago by gkronber

r15840 added utility console program for clustering of expressions (work in progress)

comment:13 Changed 16 months ago by gkronber

r15841: fixed FLANNParameters structure

comment:14 Changed 16 months ago by gkronber

r15842: added clustering of functions and output of clusters, fixed bug in evaluation

comment:15 Changed 15 months ago by lkammere

r15843: Remove duplicates in logged sentences using bash commands.

r15849: Add constants to grammar.

r15850: Remove cosine from grammar.

r15851: Remove cosine terminal symbols.

comment:16 Changed 15 months ago by lkammere

r15860: Change complexity measure from number of nodes in tree to number of variable references.

r15861: Make constant optimization toggleable in algorithm.

comment:17 Changed 15 months ago by lkammere

r15883: Priorize phrases whose (fully expanded) terms result in high R².

comment:18 Changed 15 months ago by lkammere

r15903: worked on cluster analysis / visualization for GPTP

r15907: Changes in search heuristic for solving Poly-10 problem. Adapt tree evaluation to cover non-terminal symbols.

r15910: Fix length parameter when priorizing phrases and add weighting parameter to control exploration/exploitation during search, fix copy constructors in Analyzers.

comment:19 Changed 15 months ago by gkronber

r15911: Changed initialization of SentenceLogger because started event is not called after Run(). Logging to GZipStream.

comment:20 Changed 14 months ago by gkronber

r15924: remove obsolete code in C# program for the evaluation of sentences, switch to NSME as quality measure. Tried plotting functions within clusters in R

comment:21 Changed 13 months ago by bburlacu

r15949: Fix serialization (saving the algorithm).

comment:22 Changed 13 months ago by bburlacu

r15950: Try to use variable importance information (from a random forest) to guide the search.

comment:23 Changed 13 months ago by bburlacu

r15957: Minor refactor; fix multiple analyzer event registration

comment:24 Changed 13 months ago by bburlacu

r15960: Fix serialization and cloning and plugin properties.

comment:25 Changed 12 months ago by bburlacu

r15963: Add missing storable constructors

comment:26 Changed 12 months ago by bburlacu

r15965: Improve hashing performance (about 10% measured improvement)

comment:27 Changed 12 months ago by bburlacu

r15974:

  • implement LRU cache for storing search nodes (needs better integration with the algorithm main loop)
  • introduce SortedSet for handling priorities (better memory usage, possibility to remove bad priorities, slight performance penalty)
  • fix serialization and cloning

comment:28 Changed 12 months ago by bburlacu

r15975: address additional serialization issues, make Production implement IList<T> (instead of deriving from List<T>)

comment:29 Changed 12 months ago by bburlacu

r15977: Clear search data structures at the end of the run (huge memory savings)

comment:30 Changed 12 months ago by bburlacu

r15979: Register algorithm events after deserialization.

comment:31 Changed 12 months ago by lkammere

r15981: Refactor properties to comply with .NET 4.5.2

comment:32 Changed 12 months ago by bburlacu

r15982: Add storable constructors for analyzers

comment:33 Changed 12 months ago by bburlacu

r15985: Simplify code in RSquaredEvaluator. Turn on linear scaling for the constant optimization evaluator.

comment:34 Changed 12 months ago by bburlacu

r15987: Make sure to clear search data structures before returning in GrammarEnumerationAlgorithm.OnStopped()

comment:35 Changed 12 months ago by bburlacu

r15993: Refactor code

  • move a few methods to the Grammar class
  • use a plain dictionary for storing search nodes in the SearchDataStore (instead of LRU cache)
  • make it easier to keep a consistent state between the algorithm and the evaluator (optimize constants flag)
  • track trajectories in quality/length space for best solutions
  • remove variable importances for now

comment:36 Changed 12 months ago by bburlacu

r15994: Add symbolic regression solution to results during algorithm run and scale model.

comment:37 Changed 11 months ago by bburlacu

r16019: Fix properties lacking implementation in Production.

comment:38 Changed 11 months ago by bburlacu

r16022: Remove MaxSentenceLength from priority calculation for the time being.

comment:39 Changed 11 months ago by bburlacu

r16026:

  • replace functionally-overlapping classes Production and SymbolString with a single class SymbolList
  • refactor methods from Grammar class as methods and properties of SymbolList
  • add parameter for the number of constant optimization iterations
  • refactor code

comment:40 Changed 11 months ago by bburlacu

r16053: Refactor RSquaredEvaluator as a standalone ParameterizedNamedItem which is a parameter of the algorithm. Implement BestSolutionAnalyzer analyzer for quality statistics. Add license headers where missing.

comment:41 Changed 11 months ago by bburlacu

r16056: Remove redundant EvaluatePhrase method in the Grammar class and fix compilation of tests.

comment:42 Changed 11 months ago by bburlacu

r16073: Implement restarts for constant optimization in the RSquaredEvaluator

comment:43 Changed 10 months ago by lkammere

r16088: Store pareto-optimal sentences (quality/complexity) to grammar enumeration.

comment:44 Changed 10 months ago by lkammere

r16090: Explicitely store all pareto-optimal RegressionSolution objects at the end of the algorithm.

comment:45 Changed 9 months ago by gkronber

r16151: deleted obsolete files

comment:46 Changed 9 months ago by bburlacu

r16157:

  • Update IGrammarEnumerationEvaluator interface (add Evaluate method accepting an ISymbolicExpressionTree for the case when the constants have already been optimized in the tree, add boolean OptimizeConstants flag),
  • small refactor in GrammarEnumeration/GrammarEnumerationAlgorithm.cs
  • add unit tests

comment:47 Changed 9 months ago by bburlacu

r16159: Refactor unit test using only C# 4.5 features.

comment:48 Changed 9 months ago by bburlacu

r16176: Fix hashing

comment:49 Changed 9 months ago by bburlacu

r16193: Implement new hasher (faster & less collision prone) and update unit tests.

r16194: Fix compilation errors in test :(

Last edited 9 months ago by bburlacu (previous) (diff)

comment:50 Changed 9 months ago by lkammere

r16198: Cleanup R-scripts for expression clustering.

r16201: remove branch as it was migrated to Git.

Note: See TracTickets for help on using tickets.