Opened 9 months ago
Last modified 5 weeks ago
#2950 reviewing feature request
Support hash-based simplification of symbolic expressions
Reported by: | bburlacu | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.16 |
Component: | Problems.DataAnalysis.Symbolic | Version: | trunk |
Keywords: | Cc: |
Description
Hashing of symbolic expression trees consists of assigning each node with a unique integer hash value, such that arithmetically-equivalent nodes get the same value.
This approach would have the advantage of allowing additional simplification rules, as well as identifying equivalent trees by bottom-up calculation of hash values. For example, a simple tree represented as
Addition ├──Multiplication │ ├──x │ └──Multiplication │ ├──y │ └──z └──Multiplication ├──Multiplication │ ├──x │ └──y └──z
cannot be fully simplified by the existing tree simplifier, resulting in:
Addition ├──Multiplication │ ├──y │ ├──z │ └──x └──Multiplication ├──x ├──y └──z
In this case hash-based simplification detects that the two multiplication terms are identical and is able to further simplify the tree:
Multiplication ├──z ├──y └──x
Additionally, hash values could serve a similar purpose as genetic markers (1), enabling the development of additional diversity-preserving measures and genetic operators.
(1) Burks and Punch, "An analysis of the genetic marker diversity algorithm for genetic programming" https://link.springer.com/content/pdf/10.1007%2Fs10710-016-9281-9.pdf
Change History (29)
comment:1 Changed 9 months ago by bburlacu
- Status changed from new to accepted
comment:2 Changed 9 months ago by bburlacu
comment:3 Changed 8 months ago by bburlacu
r16252: Minor refactor of HashExtensions.cs to allow method chaining. Minor refactor in SymbolicExpressionTreeHash.cs.
comment:4 Changed 8 months ago by bburlacu
- Implement first version of hash-based building blocks analyzer.
- Minor performance improvement in HashExtensions.cs.
- Fix bug in SymbolicExpressionTreeHash.cs with simplification for Multiplication nodes inadvertently altering constant values.
comment:5 Changed 8 months ago by bburlacu
r16258: Simplify code in SymbolicDataAnalysisBuildingBlockAnalyzer and fix build error.
comment:6 Changed 8 months ago by bburlacu
r16259: Add storable constructor.
comment:7 Changed 8 months ago by bburlacu
r16260: Refactor HashExtensions: simplify Reduce method.
comment:8 Changed 8 months ago by bburlacu
r16261: Fix bug in HashUtil.ToByteArray(). Improve hashing performance (10-15% gain) by avoiding array allocations for child node indices.
comment:9 Changed 8 months ago by bburlacu
r16263: Refactor hashing to use unsigned long for hashes. Implement new DiversityPreservingCrossover which prevents subtrees with the same hash value from being swapped.
comment:10 Changed 8 months ago by bburlacu
r16267: Rename HashNode.IsChild property to IsLeaf
r16270: Fix compilation error in SymbolicDataAnalysisExpressionDiversityPreservingCrossover
r16271: Fix SymbolicDataAnalysisBuildingBlockAnalyzer compilation error.
r16272: Refactor hash extensions and utility methods
- hashes are now computed from byte[] arrays
- Simplify() now accepts an argument specifying which hash function to use.
- Update SymbolicDataAnalysisBuildingBlockAnalyzer and SymbolicDataAnalysisExpressionDiversityPreservingCrossover.
r16273: Improve hashing performance.
comment:11 Changed 8 months ago by bburlacu
r16284: Add the ability to compute the structural similarity between symbolic expression trees.
comment:12 Changed 8 months ago by gkronber
r16290: adjusted scaling code for SymbolicDataAnalysisModels because with the new hashing code and simplification we cannot assume that scale and offset nodes are at the same locations
comment:13 Changed 8 months ago by bburlacu
r16291: Fix typo in ComputeAverageSimilarity
comment:14 Changed 7 months ago by bburlacu
r16302: Add support for strict hashing (taking constants and variable weights into account)
comment:15 Changed 7 months ago by gkronber
- Is "strict hashing" still "hashing"?
- Please, try to limit changes to the trunk. I have the feeling that this feature expands more and more. Larger features should be implemented in a branch and then merged back. The ticket concern was initially "Support hash-based simplification of symbolic expressions", but the more recent changes are concerned with similarities of symbolic expressions. These are in my point of view separate concerns.
comment:16 Changed 7 months ago by bburlacu
- Yes, "strict" is just an extra flag to take the coefficients of leaf nodes into account when we assign their initial hash value. No other changes involved.
- Agreed, the additional operators (crossover and analyzer) should probably be moved to a branch.
comment:17 Changed 7 months ago by gkronber
Ok, since you still call this "hashing" I assume you create a random bitvector for each different real-valued constant
- How do you determine whether two real-valued constants are equal?
- How do you detect that e.g. 2*1.0 should have the same hash-value as 2.0?
- At which point is this "hashing function" quasi equivalent to evaluating the expression for a number of different random inputs?
comment:18 Changed 7 months ago by bburlacu
- a double is 8 bytes, using one of the hash functions that takes a byte[] as input will determine that
- of course, hashing would not return the same hash value (regardless of "strict"), unless the constants are folded in the simplification step.
- why would evaluation be preferable? it would be much more work and not as reliable. my idea was that we already have scenarios where hashing should not return the same value (because there are some coefficients involved). i thought "strict" could be useful in some of those cases.
comment:19 Changed 7 months ago by bburlacu
r16305: Change Simplify inside HashNode to a delegate (instead of an Action) so that the nodes array can be passed as ref. This enables us to resize/alter the nodes array during simplification (eg, by performing term expansion or similar operations)
comment:20 Changed 6 months ago by bburlacu
r16382: Change signature of ComputeSimilarity methods to accept a generic list of trees. This enables us to directly pass HL ItemAray or ItemList without overhead.
comment:21 Changed 6 months ago by gkronber
Please try to complete your changes on the trunk until the end of the year, so that we can prepare for the next release.
comment:22 Changed 6 months ago by bburlacu
r16478: Reorganize code in SymbolicExpressionTreeHash.cs.
comment:23 Changed 3 months ago by gkronber
Is this ready for review?
comment:24 Changed 2 months ago by gkronber
Please make the required changes and move to review phase.
comment:25 Changed 5 weeks ago by bburlacu
r16979: Simplify symbol comparison (use only calculated hash value). Run simplification in a loop (successive simplification steps until no more changes).
comment:26 Changed 5 weeks ago by bburlacu
r16980: Remove building block analyzer (does not belong here), minor refactor in DiversityCrossover.
comment:27 Changed 5 weeks ago by bburlacu
r16983: Remove obsolete Comparer for T in HashNode<T>
comment:28 Changed 5 weeks ago by bburlacu
Current status
- this ticket implements the functionality for expression hashing (trees or symreg sentences) which makes up the foundation for hash-based simplification
- the SymReg algorithm depends on this functionality but implements its own simplification rules
- so far we have basic support for simplification of GP trees (more detailed rules should be developed)
- a DiversityCrossover was added that prevents swapping of subtrees with the same hash value
- Hash-based diversity and building block analyzer are removed (this topic will be developed in a branch instead)
comment:29 Changed 5 weeks ago by bburlacu
- Owner changed from bburlacu to gkronber
- Status changed from accepted to reviewing
r16218: Initial commit of hashing functionality as well as simplification rules for symbolic expression trees. As this is still in development the public api is not yet established (all methods public for now).