Opened 2 years ago

Closed 21 months ago

#2261 closed feature request (done)

Gradient Boosted Trees

Reported by: gkronber Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.12
Component: Algorithms.DataAnalysis Version: branch
Keywords: Cc:

Description


Change History (43)

comment:1 Changed 2 years ago by gkronber

  • Owner set to gkronber
  • Status changed from new to assigned

comment:2 Changed 2 years ago by gkronber

  • Milestone changed from HeuristicLab 3.3.11 to HeuristicLab 3.3.x Backlog

comment:3 Changed 2 years ago by gkronber

r12329: created branch for GBT implementation

comment:4 Changed 2 years ago by gkronber

r12332: initial import of gradient boosted trees for regression

comment:5 Changed 2 years ago by gkronber

r12342: added cloning method override for previously abstract class RegressionSolution

comment:6 Changed 2 years ago by gkronber

r12349: added serialization constructor for RegressionTreeModel

comment:7 Changed 2 years ago by gkronber

r12371: fixed a small bug (correct mapping of row indices for training partition)

comment:8 Changed 2 years ago by gkronber

r12372: implemented prototype view for gradient boosted trees

comment:9 Changed 2 years ago by gkronber

r12373: added hidden alg.-parameter to disable solution creation (useful for cross-validation)

comment:10 Changed 2 years ago by gkronber

r12374: added absolute and relative error loss functions for GBT.

comment:11 Changed 2 years ago by gkronber

r12375: made TreeNodes structs instead of classes

comment:12 Changed 23 months ago by gkronber

r12378: Merged revision(s) 12333-12365 from trunk/sources: #2373: Corrected typo in RandomBinaryVectorCreator by implementing an after-deserialization-hook.

........ #2359: Refactored pruning operators and analyzers.

........ #2359: Removed commented code from pruning analyzer.

........ #2378: Vertex.cs: Fixed bug in Label setter. ........ #2359: The changes in r12358 look fine to me. Added total number of pruned nodes in the analyzer's data table. Removed unused parameter names in the SymbolicDataAnalysisSingleObjectivePruningAnalyzer. ........ #2345: Fixed x-axis maximum in error characteristics curve.

........

comment:13 Changed 23 months ago by gkronber

TODO:

  • exception occurring for very large tree depths
  • crash when the test set is empty
  • node-queue instead of full expansion to max depth
Last edited 21 months ago by gkronber (previous) (diff)

comment:14 Changed 21 months ago by gkronber

r12495: merged changes from trunk to branch

r12494 #2403: added a null check in the MatlabParameterVectorEvaluator to prevent exceptions when clearstate is called


r12493 #2369: added support for squared errors and relative errors to error-characteristic-curve view


r12492 #2392: implemented PearsonsRCalculator to fix incorrect correlation values in the correlation matrix.


r12491 #2402 don't set task state to waiting when it fails


r12490 #2401 added missing Mono.Cecil plugin dependency


r12488 #2400 - Interfaces for Capaciated-, PickupAndDelivery- and TimeWindowed-ProblemInstances now specify an additional penalty parameter to set the current penalty factor for the constraint relaxation. - The setter of the penalty-property in ...


r12485 #2374 made RegressionSolution and ClassificationSolution non-abstract


r12482 #2320: Fixed warnings in unit test solutions introduced in r12420 by marking methods as obsolete.


r12481 #2320: Fixed AfterDeserialization of GEArtifialAntEvaluator.


r12480 #2320: Fixed error in symbolicexpressiontree crossover regarding the wiring of lookup parameters if persisted file is loaded.


r12479 #2397 moved GeoIP project into ExtLibs


r12478 #2329 fixed bug in simple code editor


r12476 #2331 removed outdated plugins


r12475 #2368 fixed compile warnings


r12474 #2399 worked on Mono project prepare script


r12473 #2329 added a simple code editor for Linux


r12472 #2399 - fixed MathJax.js file name - worked on Mono project prepare script


r12471 #2399 worked on Mono project prepare script


r12470 #2399 fixed pre-build events in project files


r12465 #2399 worked on mono project prepare script


r12464 #2399 added patch to project


r12463 #2399 fixed EPPlus so that it compiles on Linux


r12461 #2398: Skip root and start symbols when calculating impacts and replacement values in the pruning operators.


r12458 #2354 show label when no data is displayed and don't show the legend


r12457 #2353 removed duplicated call to Any() in Hive Status page


r12456 #2368 fixed modifier


r12455 #2368 added support in persistence for typecaches in streams


r12445 #2394: Changed Web.config compilation from debug to release to force script bundling. Changed history loading type from lazy to eager loading to increase performance. Fixed "getCoreStatus" typo in statusCtrl.js


r12443 #2394: Fixed UserTaskQuery and GetStatusHistory in the WebApp.Status plugin


r12442 #2394 added nuget folders to svn ignore list


r12435 #2394: Improved PluginManager and updated hive status monitor.


r12434 #2396 added symbolic expression tree formatter for C#


r12433 #2395: Minor change in DoubleValue.GetValue.


r12432 #2395 Use simple round-trip format for doubles because G17 prints some strange numbers (20.22 to 20.219999999999999999). Some accuracy can still be lost on 64bit machines, but should be very rare and minimal. double.MaxValue can still be pa...


r12431 #2395 Fixed parsing issues by using the G17 format.


r12430 #2394 added missing package config


r12429 #2394 added missing package config


r12428 #2394 added web app and status page to trunk


r12424 #2320: Adapted plugin file and updated project file of SymbolicExpressionTreeEncoding.


r12422 #2320: Merged the encoding class and all accompanying changes in the trunk.


r12401 #2387 Fixed a bug where the automatic selection of the first element behaved differently for the NewItemDialog.


r12400 #2387 Forgot to commit a file.


r12399 #2387 - Added context-menu for expanding and collapsing tree-nodes. - Improve response time when expanding/collapsing all nodes for TypeSelector and NewItemDialog.


r12398 #2387 - Added clearSearch-button in TypeSelector. - Adapted behavior of TypeSelector and NewItemDialog that a selected node stays selected as long as it matches the search criteria.


r12397 #2387 - Adapted behavior of the matching in the TypeSelector that it behave the same as the NewItemDialog. The search string is tokenized by space and matches if all tokens are contained, (eg. "Sym Reg" matches "SymbolicRegression...")...


r12393 #2025 - Removed Expand/CollapseAll buttons. - Removed cycling of items.


r12392 #2386: Updated GetHashCode method in the EnumerableBoolEqualityComparer.


comment:15 Changed 21 months ago by gkronber

r12587: removed everything that is not necessary to review for trunk integration (including experimental views for gradient boosted trees)

comment:16 Changed 21 months ago by gkronber

r12588: merged changes from trunk

comment:17 Changed 21 months ago by gkronber

r12589: adapted interface to use IDataset instead of Dataset and added logistic regression loss function

comment:18 Changed 21 months ago by gkronber

r12590: preparations for trunk integration (adapt to current trunk version, add license headers, add comments, improve code quality)

comment:19 Changed 21 months ago by gkronber

r12591: fixed slow evaluation, even when results are already cached

comment:20 Changed 21 months ago by gkronber

r12597: comments and minor improvements

comment:21 Changed 21 months ago by gkronber

r12607: also use line search function for the initial estimation f0, changed logistic regression loss function to match description in GBM paper, comments and code improvements

comment:22 Changed 21 months ago by gkronber

r12611: produce classification solution (discriminant function) when using logistic regression loss

comment:23 Changed 21 months ago by gkronber

r12619: replace recursion by a stack to prepare for unbalanced tree expansion

comment:24 Changed 21 months ago by gkronber

r12620: corrected check if a split is useful, added a unit test class and added an elaborate comment on split quality calculation

comment:25 Changed 21 months ago by gkronber

r12623: comments

comment:26 Changed 21 months ago by gkronber

r12632: implemented node expansion using a priority queue (and changed parameter MaxDepth to MaxSize). Moved unit tests to a separate project.

comment:27 Changed 21 months ago by gkronber

r12635: marked potential future efficiency improvements as identified through profiling

comment:28 Changed 21 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing

Only the branch "GBT-trunkintegration" has to be reviewed.

The following changesets have been applied only to GBT-trunkintegration: r12585 (create branch) r12587:12591 (remove everything unnecessary and adapt to current trunk version) r12597 r12607 r12611 r12619 r12620 r12623 r12632 r12635

comment:29 Changed 21 months ago by mkommend

Reviewing comments

TODO

Lineasearch for logistic loss function GradientBoostedTreesAlgorithm GradientBoostedTreesAlgorithmStatic Finish RegressionTreeBuilder

What about weights for the linesearches in the loss functions? Either they are not supported or ignored~~

I guess it would be better to remove them completely. No big issue to add them later again. (r12696)

RegressionTreeModel.cs

  • The tree should not be public accessible and modifyable (clone in default ctor, no shallow cloning) (r12658)

TreeNode

GradientBoostedTreesModel.cs

  • Is line 66 really necessary? What about !rows.Any() return Enumerable.Empty<double>(); (r12660)

RegressionTreeBuilder.cs

  • Why is it necessary to change y? As a result multiple calls to CreateRegressionTree and CreateRegressionTreeForGradientBoosting yield different, wrong results!!!
  • Couldn't be the RegressionTreeBuilder be implemented as a stateless static class?
  • The whole class is not thread-safe

    Yes. GradientBoostedTreesAlgorithmStatic is the stateless thread-safe facade for RegressionTreeBuilder RegressionTreeBuilder should be an internal class (r12661)

Last edited 21 months ago by gkronber (previous) (diff)

comment:30 Changed 21 months ago by gkronber

  • Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.12
  • Version changed from 3.3.10 to branch

comment:31 Changed 21 months ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to assigned

comment:32 Changed 21 months ago by gkronber

r12696: killed all weights

comment:33 Changed 21 months ago by gkronber

r12697: removed line search closure (binding y.ToArray(), and pred.ToArray())

comment:34 Changed 21 months ago by gkronber

r12698: hiding internals of GbmState

comment:35 Changed 21 months ago by gkronber

r12699: improved performance of evaluation for regression tree models

comment:36 Changed 21 months ago by gkronber

Made some timings using the unit tests

Unit Test iterations before r12699 with column caching (r12699) with column caching + no final solution
Tower & absolute error 1000 11.8s 9.6s 7.3s
Tower & relative error 3000 39.7s 33.8s 26.8s
Tower & squared error 5000 53.4s 43.8s 32.1s

comment:37 Changed 21 months ago by gkronber

r12700: copied GBT implementation from branch to trunk

comment:38 Changed 21 months ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from assigned to reviewing

comment:39 Changed 21 months ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

Reviewed r12696, r12697, r12698 ,and r12699.

Please remove the branch before releasing this ticket.

comment:40 Changed 21 months ago by gkronber

r12710: cached training and test rows in GBT for another speedup of ~1.5 (+renamed test class)

comment:41 Changed 21 months ago by gkronber

r12711: merged r12700 and r12710 from trunk to stable

comment:42 Changed 21 months ago by gkronber

r12712 and r12713 deleted old GBT branches

comment:43 Changed 21 months ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.