Opened 6 years ago
Closed 5 years ago
#2950 closed feature request (done)
Support hash-based simplification of symbolic expressions
Reported by: | bburlacu | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.16 |
Component: | Problems.DataAnalysis.Symbolic | Version: | trunk |
Keywords: | merged | Cc: |
Description
Hashing of symbolic expression trees consists of assigning each node with a unique integer hash value, such that arithmetically-equivalent nodes get the same value.
This approach would have the advantage of allowing additional simplification rules, as well as identifying equivalent trees by bottom-up calculation of hash values. For example, a simple tree represented as
Addition ├──Multiplication │ ├──x │ └──Multiplication │ ├──y │ └──z └──Multiplication ├──Multiplication │ ├──x │ └──y └──z
cannot be fully simplified by the existing tree simplifier, resulting in:
Addition ├──Multiplication │ ├──y │ ├──z │ └──x └──Multiplication ├──x ├──y └──z
In this case hash-based simplification detects that the two multiplication terms are identical and is able to further simplify the tree:
Multiplication ├──z ├──y └──x
Additionally, hash values could serve a similar purpose as genetic markers (1), enabling the development of additional diversity-preserving measures and genetic operators.
(1) Burks and Punch, "An analysis of the genetic marker diversity algorithm for genetic programming" https://link.springer.com/content/pdf/10.1007%2Fs10710-016-9281-9.pdf
Change History (41)
comment:1 Changed 6 years ago by bburlacu
- Status changed from new to accepted
comment:2 Changed 6 years ago by bburlacu
comment:3 Changed 6 years ago by bburlacu
r16252: Minor refactor of HashExtensions.cs to allow method chaining. Minor refactor in SymbolicExpressionTreeHash.cs.
comment:4 Changed 6 years ago by bburlacu
- Implement first version of hash-based building blocks analyzer.
- Minor performance improvement in HashExtensions.cs.
- Fix bug in SymbolicExpressionTreeHash.cs with simplification for Multiplication nodes inadvertently altering constant values.
comment:5 Changed 6 years ago by bburlacu
r16258: Simplify code in SymbolicDataAnalysisBuildingBlockAnalyzer and fix build error.
comment:6 Changed 6 years ago by bburlacu
r16259: Add storable constructor.
comment:7 Changed 6 years ago by bburlacu
r16260: Refactor HashExtensions: simplify Reduce method.
comment:8 Changed 6 years ago by bburlacu
r16261: Fix bug in HashUtil.ToByteArray(). Improve hashing performance (10-15% gain) by avoiding array allocations for child node indices.
comment:9 Changed 6 years ago by bburlacu
r16263: Refactor hashing to use unsigned long for hashes. Implement new DiversityPreservingCrossover which prevents subtrees with the same hash value from being swapped.
comment:10 Changed 6 years ago by bburlacu
r16267: Rename HashNode.IsChild property to IsLeaf
r16270: Fix compilation error in SymbolicDataAnalysisExpressionDiversityPreservingCrossover
r16271: Fix SymbolicDataAnalysisBuildingBlockAnalyzer compilation error.
r16272: Refactor hash extensions and utility methods
- hashes are now computed from byte[] arrays
- Simplify() now accepts an argument specifying which hash function to use.
- Update SymbolicDataAnalysisBuildingBlockAnalyzer and SymbolicDataAnalysisExpressionDiversityPreservingCrossover.
r16273: Improve hashing performance.
comment:11 Changed 6 years ago by bburlacu
r16284: Add the ability to compute the structural similarity between symbolic expression trees.
comment:12 Changed 6 years ago by gkronber
r16290: adjusted scaling code for SymbolicDataAnalysisModels because with the new hashing code and simplification we cannot assume that scale and offset nodes are at the same locations
comment:13 Changed 6 years ago by bburlacu
r16291: Fix typo in ComputeAverageSimilarity
comment:14 Changed 6 years ago by bburlacu
r16302: Add support for strict hashing (taking constants and variable weights into account)
comment:15 Changed 6 years ago by gkronber
- Is "strict hashing" still "hashing"?
- Please, try to limit changes to the trunk. I have the feeling that this feature expands more and more. Larger features should be implemented in a branch and then merged back. The ticket concern was initially "Support hash-based simplification of symbolic expressions", but the more recent changes are concerned with similarities of symbolic expressions. These are in my point of view separate concerns.
comment:16 Changed 6 years ago by bburlacu
- Yes, "strict" is just an extra flag to take the coefficients of leaf nodes into account when we assign their initial hash value. No other changes involved.
- Agreed, the additional operators (crossover and analyzer) should probably be moved to a branch.
comment:17 Changed 6 years ago by gkronber
Ok, since you still call this "hashing" I assume you create a random bitvector for each different real-valued constant
- How do you determine whether two real-valued constants are equal?
- How do you detect that e.g. 2*1.0 should have the same hash-value as 2.0?
- At which point is this "hashing function" quasi equivalent to evaluating the expression for a number of different random inputs?
comment:18 Changed 6 years ago by bburlacu
- a double is 8 bytes, using one of the hash functions that takes a byte[] as input will determine that
- of course, hashing would not return the same hash value (regardless of "strict"), unless the constants are folded in the simplification step.
- why would evaluation be preferable? it would be much more work and not as reliable. my idea was that we already have scenarios where hashing should not return the same value (because there are some coefficients involved). i thought "strict" could be useful in some of those cases.
comment:19 Changed 6 years ago by bburlacu
r16305: Change Simplify inside HashNode to a delegate (instead of an Action) so that the nodes array can be passed as ref. This enables us to resize/alter the nodes array during simplification (eg, by performing term expansion or similar operations)
comment:20 Changed 6 years ago by bburlacu
r16382: Change signature of ComputeSimilarity methods to accept a generic list of trees. This enables us to directly pass HL ItemAray or ItemList without overhead.
comment:21 Changed 6 years ago by gkronber
Please try to complete your changes on the trunk until the end of the year, so that we can prepare for the next release.
comment:22 Changed 6 years ago by bburlacu
r16478: Reorganize code in SymbolicExpressionTreeHash.cs.
comment:23 Changed 6 years ago by gkronber
Is this ready for review?
comment:24 Changed 6 years ago by gkronber
Please make the required changes and move to review phase.
comment:25 Changed 6 years ago by bburlacu
r16979: Simplify symbol comparison (use only calculated hash value). Run simplification in a loop (successive simplification steps until no more changes).
comment:26 Changed 6 years ago by bburlacu
r16980: Remove building block analyzer (does not belong here), minor refactor in DiversityCrossover.
comment:27 Changed 6 years ago by bburlacu
r16983: Remove obsolete Comparer for T in HashNode<T>
comment:28 Changed 6 years ago by bburlacu
Current status
- this ticket implements the functionality for expression hashing (trees or symreg sentences) which makes up the foundation for hash-based simplification
- the SymReg algorithm depends on this functionality but implements its own simplification rules
- so far we have basic support for simplification of GP trees (more detailed rules should be developed)
- a DiversityCrossover was added that prevents swapping of subtrees with the same hash value
- Hash-based diversity and building block analyzer are removed (this topic will be developed in a branch instead)
comment:29 Changed 6 years ago by bburlacu
- Owner changed from bburlacu to gkronber
- Status changed from accepted to reviewing
comment:30 Changed 6 years ago by mkommend
This ticket blocks the reintegration of the persistence into stable and should be prioritized.
comment:31 Changed 6 years ago by gkronber
Reviewed changesets connected to this ticket and made some changes in r17076.
comment:32 Changed 6 years ago by gkronber
- Status changed from reviewing to readytorelease
comment:33 Changed 6 years ago by abeham
r17090: merged revisions 16218, 16252, 16255, 16258, 16259, 16260, 16261, 16263, 16267, 16270, 16271, 16272, 16273, 16284, 16290, 16291, 16302, 16305, 16382, 16478
comment:34 Changed 6 years ago by abeham
- Keywords depends-2520 added
comment:35 Changed 5 years ago by mkommend
- Keywords merged added; depends-2520 removed
r17099: Merged 16979, 16980, 16983 into stable.
comment:36 follow-up: ↓ 37 Changed 5 years ago by bburlacu
r17132: Small refactor in HashExtensions.cs (better comparison, consistent use of the same hash function). Fix bug and improve robustness of SimplifyMultiplication in SymbolicExpressionHashExtensions.cs
comment:37 in reply to: ↑ 36 Changed 5 years ago by abeham
Replying to bburlacu:
r17132: Small refactor in HashExtensions.cs (better comparison, consistent use of the same hash function). Fix bug and improve robustness of SimplifyMultiplication in SymbolicExpressionHashExtensions.cs
This ticket has been merged to stable and is finished, please make further changes to #3015. We have code reviews for all changes. If these bugs are critical please talk to either Gabriel or Michael to review them.
comment:38 Changed 5 years ago by abeham
comment:39 Changed 5 years ago by abeham
- Keywords depends-2991 partially-merged added; merged removed
comment:40 Changed 5 years ago by abeham
- Keywords merged added; depends-2991 partially-merged removed
r17141: merged 17076 to stable
comment:41 Changed 5 years ago by abeham
- Resolution set to done
- Status changed from readytorelease to closed
r16218: Initial commit of hashing functionality as well as simplification rules for symbolic expression trees. As this is still in development the public api is not yet established (all methods public for now).