Opened 10 years ago
Closed 9 years ago
#2261 closed feature request (done)
Gradient Boosted Trees
Reported by: | gkronber | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.12 |
Component: | Algorithms.DataAnalysis | Version: | branch |
Keywords: | Cc: |
Description
Change History (43)
comment:1 Changed 10 years ago by gkronber
- Owner set to gkronber
- Status changed from new to assigned
comment:2 Changed 10 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.11 to HeuristicLab 3.3.x Backlog
comment:3 Changed 10 years ago by gkronber
comment:4 Changed 10 years ago by gkronber
r12332: initial import of gradient boosted trees for regression
comment:5 Changed 10 years ago by gkronber
r12342: added cloning method override for previously abstract class RegressionSolution
comment:6 Changed 10 years ago by gkronber
r12349: added serialization constructor for RegressionTreeModel
comment:7 Changed 10 years ago by gkronber
r12371: fixed a small bug (correct mapping of row indices for training partition)
comment:8 Changed 10 years ago by gkronber
r12372: implemented prototype view for gradient boosted trees
comment:9 Changed 10 years ago by gkronber
r12373: added hidden alg.-parameter to disable solution creation (useful for cross-validation)
comment:10 Changed 10 years ago by gkronber
r12374: added absolute and relative error loss functions for GBT.
comment:11 Changed 10 years ago by gkronber
r12375: made TreeNodes structs instead of classes
comment:12 Changed 10 years ago by gkronber
r12378: Merged revision(s) 12333-12365 from trunk/sources: #2373: Corrected typo in RandomBinaryVectorCreator by implementing an after-deserialization-hook.
........ #2359: Refactored pruning operators and analyzers.
........ #2359: Removed commented code from pruning analyzer.
........ #2378: Vertex.cs: Fixed bug in Label setter. ........ #2359: The changes in r12358 look fine to me. Added total number of pruned nodes in the analyzer's data table. Removed unused parameter names in the SymbolicDataAnalysisSingleObjectivePruningAnalyzer. ........ #2345: Fixed x-axis maximum in error characteristics curve.
........
comment:13 Changed 10 years ago by gkronber
TODO:
exception occurring for very large tree depthscrash when the test set is emptynode-queue instead of full expansion to max depth
comment:14 Changed 9 years ago by gkronber
r12495: merged changes from trunk to branch
r12494 #2403: added a null check in the MatlabParameterVectorEvaluator to prevent exceptions when clearstate is called
r12493 #2369: added support for squared errors and relative errors to error-characteristic-curve view
r12492 #2392: implemented PearsonsRCalculator to fix incorrect correlation values in the correlation matrix.
r12491 #2402 don't set task state to waiting when it fails
r12490 #2401 added missing Mono.Cecil plugin dependency
r12488 #2400 - Interfaces for Capaciated-, PickupAndDelivery- and TimeWindowed-ProblemInstances now specify an additional penalty parameter to set the current penalty factor for the constraint relaxation. - The setter of the penalty-property in ...
r12485 #2374 made RegressionSolution and ClassificationSolution non-abstract
r12482 #2320: Fixed warnings in unit test solutions introduced in r12420 by marking methods as obsolete.
r12481 #2320: Fixed AfterDeserialization of GEArtifialAntEvaluator.
r12480 #2320: Fixed error in symbolicexpressiontree crossover regarding the wiring of lookup parameters if persisted file is loaded.
r12479 #2397 moved GeoIP project into ExtLibs
r12478 #2329 fixed bug in simple code editor
r12476 #2331 removed outdated plugins
r12475 #2368 fixed compile warnings
r12474 #2399 worked on Mono project prepare script
r12473 #2329 added a simple code editor for Linux
r12472 #2399 - fixed MathJax.js file name - worked on Mono project prepare script
r12471 #2399 worked on Mono project prepare script
r12470 #2399 fixed pre-build events in project files
r12465 #2399 worked on mono project prepare script
r12464 #2399 added patch to project
r12463 #2399 fixed EPPlus so that it compiles on Linux
r12461 #2398: Skip root and start symbols when calculating impacts and replacement values in the pruning operators.
r12458 #2354 show label when no data is displayed and don't show the legend
r12457 #2353 removed duplicated call to Any() in Hive Status page
r12455 #2368 added support in persistence for typecaches in streams
r12445 #2394: Changed Web.config compilation from debug to release to force script bundling. Changed history loading type from lazy to eager loading to increase performance. Fixed "getCoreStatus" typo in statusCtrl.js
r12443 #2394: Fixed UserTaskQuery and GetStatusHistory in the WebApp.Status plugin
r12442 #2394 added nuget folders to svn ignore list
r12435 #2394: Improved PluginManager and updated hive status monitor.
r12434 #2396 added symbolic expression tree formatter for C#
r12433 #2395: Minor change in DoubleValue.GetValue.
r12432 #2395 Use simple round-trip format for doubles because G17 prints some strange numbers (20.22 to 20.219999999999999999). Some accuracy can still be lost on 64bit machines, but should be very rare and minimal. double.MaxValue can still be pa...
r12431 #2395 Fixed parsing issues by using the G17 format.
r12430 #2394 added missing package config
r12429 #2394 added missing package config
r12428 #2394 added web app and status page to trunk
r12424 #2320: Adapted plugin file and updated project file of SymbolicExpressionTreeEncoding.
r12422 #2320: Merged the encoding class and all accompanying changes in the trunk.
r12401 #2387 Fixed a bug where the automatic selection of the first element behaved differently for the NewItemDialog.
r12400 #2387 Forgot to commit a file.
r12399 #2387 - Added context-menu for expanding and collapsing tree-nodes. - Improve response time when expanding/collapsing all nodes for TypeSelector and NewItemDialog.
r12398 #2387 - Added clearSearch-button in TypeSelector. - Adapted behavior of TypeSelector and NewItemDialog that a selected node stays selected as long as it matches the search criteria.
r12397 #2387 - Adapted behavior of the matching in the TypeSelector that it behave the same as the NewItemDialog. The search string is tokenized by space and matches if all tokens are contained, (eg. "Sym Reg" matches "SymbolicRegression...")...
r12393 #2025 - Removed Expand/CollapseAll buttons. - Removed cycling of items.
r12392 #2386: Updated GetHashCode method in the EnumerableBoolEqualityComparer.
comment:15 Changed 9 years ago by gkronber
r12587: removed everything that is not necessary to review for trunk integration (including experimental views for gradient boosted trees)
comment:16 Changed 9 years ago by gkronber
r12588: merged changes from trunk
comment:17 Changed 9 years ago by gkronber
r12589: adapted interface to use IDataset instead of Dataset and added logistic regression loss function
comment:18 Changed 9 years ago by gkronber
r12590: preparations for trunk integration (adapt to current trunk version, add license headers, add comments, improve code quality)
comment:19 Changed 9 years ago by gkronber
r12591: fixed slow evaluation, even when results are already cached
comment:20 Changed 9 years ago by gkronber
r12597: comments and minor improvements
comment:21 Changed 9 years ago by gkronber
r12607: also use line search function for the initial estimation f0, changed logistic regression loss function to match description in GBM paper, comments and code improvements
comment:22 Changed 9 years ago by gkronber
r12611: produce classification solution (discriminant function) when using logistic regression loss
comment:23 Changed 9 years ago by gkronber
r12619: replace recursion by a stack to prepare for unbalanced tree expansion
comment:24 Changed 9 years ago by gkronber
r12620: corrected check if a split is useful, added a unit test class and added an elaborate comment on split quality calculation
comment:25 Changed 9 years ago by gkronber
r12623: comments
comment:26 Changed 9 years ago by gkronber
r12632: implemented node expansion using a priority queue (and changed parameter MaxDepth to MaxSize). Moved unit tests to a separate project.
comment:27 Changed 9 years ago by gkronber
r12635: marked potential future efficiency improvements as identified through profiling
comment:28 Changed 9 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from assigned to reviewing
comment:29 Changed 9 years ago by mkommend
Reviewing comments
TODO
Lineasearch for logistic loss function GradientBoostedTreesAlgorithm GradientBoostedTreesAlgorithmStatic Finish RegressionTreeBuilder
What about weights for the linesearches in the loss functions? Either they are not supported or ignored!
I guess it would be better to remove them completely. No big issue to add them later again.
RegressionTreeModel.cs
The tree should not be public accessible and modifyable (clone in default ctor, no shallow cloning)(r12658)
TreeNode
Should be immutable(r12658)Should override equals (MSDN link)(r12658 & r12663)
GradientBoostedTreesModel.cs
Is line 66 really necessary? What about !rows.Any() return Enumerable.Empty<double>();(r12660)
RegressionTreeBuilder.cs
- Why is it necessary to change y? As a result multiple calls to CreateRegressionTree and CreateRegressionTreeForGradientBoosting yield different, wrong results!!!
- Couldn't be the RegressionTreeBuilder be implemented as a stateless static class?
- The whole class is not thread-safe
Yes. GradientBoostedTreesAlgorithmStatic is the stateless thread-safe facade for RegressionTreeBuilder
RegressionTreeBuilder should be an internal class(r12661)
comment:30 Changed 9 years ago by gkronber
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.12
- Version changed from 3.3.10 to branch
comment:31 Changed 9 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from reviewing to assigned
comment:32 Changed 9 years ago by gkronber
r12696: killed all weights
comment:33 Changed 9 years ago by gkronber
r12697: removed line search closure (binding y.ToArray(), and pred.ToArray())
comment:34 Changed 9 years ago by gkronber
r12698: hiding internals of GbmState
comment:35 Changed 9 years ago by gkronber
r12699: improved performance of evaluation for regression tree models
comment:36 Changed 9 years ago by gkronber
comment:37 Changed 9 years ago by gkronber
r12700: copied GBT implementation from branch to trunk
comment:38 Changed 9 years ago by gkronber
- Owner changed from gkronber to mkommend
- Status changed from assigned to reviewing
comment:39 Changed 9 years ago by mkommend
- Owner changed from mkommend to gkronber
- Status changed from reviewing to readytorelease
comment:40 Changed 9 years ago by gkronber
r12710: cached training and test rows in GBT for another speedup of ~1.5 (+renamed test class)
comment:41 Changed 9 years ago by gkronber
comment:42 Changed 9 years ago by gkronber
comment:43 Changed 9 years ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
r12329: created branch for GBT implementation