Opened 7 years ago
Last modified 6 years ago
#2886 accepted feature request
Implement Naive Grammar Enumeration for Symb. Regression
Reported by: | lkammere | Owned by: | lkammere |
---|---|---|---|
Priority: | medium | Milestone: | |
Component: | Algorithms.DataAnalysis | Version: | branch |
Keywords: | Cc: |
Description
As a first step in deterministic symbolic regression, implement an algorithm that iterates and checks all possible sentences of a grammar in order to find the best model structure for (very simple) regression problems.
Change History (50)
comment:1 Changed 7 years ago by lkammere
- Status changed from new to accepted
comment:2 Changed 7 years ago by lkammere
comment:3 Changed 7 years ago by lkammere
comment:4 Changed 7 years ago by lkammere
r15776: Refactor data generation in unit tests.
comment:5 Changed 7 years ago by lkammere
r15784: Add basic implementation for inverse factors.
comment:6 Changed 7 years ago by gkronber
r15806: made a few comments
comment:7 Changed 7 years ago by lkammere
r15791: Replace integer hashing of phrases with simplification to (temporary) string representations.
r15795: Remove nested divisions from grammar and hashing.
r15800: Refactor code and fix performance issues.
r15803: Deactivate generation of dot file for visualizing search tree.
r15812: Performance Improvements - Only store hash of archived phrases and reduce number of enumerators.
comment:8 Changed 7 years ago by lkammere
r15817: Add cosine to grammar.
comment:9 Changed 7 years ago by lkammere
comment:10 Changed 7 years ago by lkammere
comment:11 Changed 7 years ago by lkammere
comment:12 Changed 7 years ago by gkronber
r15840 added utility console program for clustering of expressions (work in progress)
comment:13 Changed 7 years ago by gkronber
r15841: fixed FLANNParameters structure
comment:14 Changed 7 years ago by gkronber
r15842: added clustering of functions and output of clusters, fixed bug in evaluation
comment:15 Changed 7 years ago by lkammere
comment:16 Changed 7 years ago by lkammere
comment:17 Changed 7 years ago by lkammere
r15883: Priorize phrases whose (fully expanded) terms result in high R².
comment:18 Changed 6 years ago by lkammere
r15903: worked on cluster analysis / visualization for GPTP
r15907: Changes in search heuristic for solving Poly-10 problem. Adapt tree evaluation to cover non-terminal symbols.
r15910: Fix length parameter when priorizing phrases and add weighting parameter to control exploration/exploitation during search, fix copy constructors in Analyzers.
comment:19 Changed 6 years ago by gkronber
r15911: Changed initialization of SentenceLogger because started event is not called after Run(). Logging to GZipStream.
comment:20 Changed 6 years ago by gkronber
r15924: remove obsolete code in C# program for the evaluation of sentences, switch to NSME as quality measure. Tried plotting functions within clusters in R
comment:21 Changed 6 years ago by bburlacu
r15949: Fix serialization (saving the algorithm).
comment:22 Changed 6 years ago by bburlacu
r15950: Try to use variable importance information (from a random forest) to guide the search.
comment:23 Changed 6 years ago by bburlacu
r15957: Minor refactor; fix multiple analyzer event registration
comment:24 Changed 6 years ago by bburlacu
r15960: Fix serialization and cloning and plugin properties.
comment:25 Changed 6 years ago by bburlacu
r15963: Add missing storable constructors
comment:26 Changed 6 years ago by bburlacu
r15965: Improve hashing performance (about 10% measured improvement)
comment:27 Changed 6 years ago by bburlacu
- implement LRU cache for storing search nodes (needs better integration with the algorithm main loop)
- introduce SortedSet for handling priorities (better memory usage, possibility to remove bad priorities, slight performance penalty)
- fix serialization and cloning
comment:28 Changed 6 years ago by bburlacu
r15975: address additional serialization issues, make Production implement IList<T> (instead of deriving from List<T>)
comment:29 Changed 6 years ago by bburlacu
r15977: Clear search data structures at the end of the run (huge memory savings)
comment:30 Changed 6 years ago by bburlacu
r15979: Register algorithm events after deserialization.
comment:31 Changed 6 years ago by lkammere
r15981: Refactor properties to comply with .NET 4.5.2
comment:32 Changed 6 years ago by bburlacu
r15982: Add storable constructors for analyzers
comment:33 Changed 6 years ago by bburlacu
r15985: Simplify code in RSquaredEvaluator. Turn on linear scaling for the constant optimization evaluator.
comment:34 Changed 6 years ago by bburlacu
r15987: Make sure to clear search data structures before returning in GrammarEnumerationAlgorithm.OnStopped()
comment:35 Changed 6 years ago by bburlacu
r15993: Refactor code
- move a few methods to the Grammar class
- use a plain dictionary for storing search nodes in the SearchDataStore (instead of LRU cache)
- make it easier to keep a consistent state between the algorithm and the evaluator (optimize constants flag)
- track trajectories in quality/length space for best solutions
- remove variable importances for now
comment:36 Changed 6 years ago by bburlacu
r15994: Add symbolic regression solution to results during algorithm run and scale model.
comment:37 Changed 6 years ago by bburlacu
r16019: Fix properties lacking implementation in Production.
comment:38 Changed 6 years ago by bburlacu
r16022: Remove MaxSentenceLength from priority calculation for the time being.
comment:39 Changed 6 years ago by bburlacu
- replace functionally-overlapping classes Production and SymbolString with a single class SymbolList
- refactor methods from Grammar class as methods and properties of SymbolList
- add parameter for the number of constant optimization iterations
- refactor code
comment:40 Changed 6 years ago by bburlacu
r16053: Refactor RSquaredEvaluator as a standalone ParameterizedNamedItem which is a parameter of the algorithm. Implement BestSolutionAnalyzer analyzer for quality statistics. Add license headers where missing.
comment:41 Changed 6 years ago by bburlacu
r16056: Remove redundant EvaluatePhrase method in the Grammar class and fix compilation of tests.
comment:42 Changed 6 years ago by bburlacu
r16073: Implement restarts for constant optimization in the RSquaredEvaluator
comment:43 Changed 6 years ago by lkammere
r16088: Store pareto-optimal sentences (quality/complexity) to grammar enumeration.
comment:44 Changed 6 years ago by lkammere
r16090: Explicitely store all pareto-optimal RegressionSolution objects at the end of the algorithm.
comment:45 Changed 6 years ago by gkronber
r16151: deleted obsolete files
comment:46 Changed 6 years ago by bburlacu
- Update IGrammarEnumerationEvaluator interface (add Evaluate method accepting an ISymbolicExpressionTree for the case when the constants have already been optimized in the tree, add boolean OptimizeConstants flag),
- small refactor in GrammarEnumeration/GrammarEnumerationAlgorithm.cs
- add unit tests
comment:47 Changed 6 years ago by bburlacu
r16159: Refactor unit test using only C# 4.5 features.
comment:48 Changed 6 years ago by bburlacu
r16176: Fix hashing
r15712: Add basic class structure, grammar and grammar iteration.
r15714: Add tree hashing for addition and multiplication.
r15722: Add evaluation of sentences.
r15723: Add simple data analysis tests and further informations about algorithm run.
r15724: Add parsing to infix form for debugging purpose.
r15725: Refactor tree hash function.
r15726: Overwrite long sentences when a shorter one with same hash was found.
r15734: worked on grammar enumeration
r15746: Refactor grammar enumeration alg.
r15765: Add graphviz output.