Opened 4 years ago

Closed 4 years ago

#1968 closed enhancement (done)

The number of used variable per tree should be configurable in random forests modeling

Reported by: mkommend Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.8
Component: Algorithms.DataAnalysis Version: 3.3.8
Keywords: Cc:

Description (last modified by mkommend)

Another problem is that currently the random seed cannot be specified and hence the results of a random forest modeling run are not reproducable.

Change History (12)

comment:1 Changed 4 years ago by mkommend

  • Description modified (diff)
  • Status changed from new to accepted

comment:2 Changed 4 years ago by mkommend

r8786: Added seed and m parameter to random forest modeling.

comment:3 Changed 4 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to reviewing

Please have a detailed look at the locking of the RNG to enable the specification of a seed value and feel free to rename the M parameter.

comment:4 Changed 4 years ago by mkommend

  • Owner changed from gkronber to mkommend
  • Status changed from reviewing to assigned

comment:5 follow-up: Changed 4 years ago by mkommend

  • Status changed from assigned to accepted

There is a problem if multiple random forest regression algorithms run in parallel, it is not guaranteed that the result is reproducible.

comment:6 in reply to: ↑ 5 Changed 4 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to assigned

Replying to mkommend:

There is a problem if multiple random forest regression algorithms run in parallel, it is not guaranteed that the result is reproducible.

I had a further look at the problem and IMHO we either have to add a [ThreadStatic] attribute to the RNG in the alglib sources or remove the seed from the parameter list which yield to not reproducible random forests runs.

comment:7 Changed 4 years ago by mkommend

  • Owner changed from gkronber to mkommend

It was decided to change the ALGLIB sources to use [ThreadStatic] for the RNG.

comment:8 Changed 4 years ago by mkommend

  • Status changed from assigned to accepted

comment:9 Changed 4 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to reviewing

r8803: Added [ThreadStatic] to the RNG of ALGLIB and removed lock from random forest algorithm.

The ALGLIB source file automatically got formatted according to my local settings. However, the only line changed was ap.cs line 494.

comment:10 Changed 4 years ago by mkommend

r8805: Added initialization code for the RNG in the ALGLIB sources.

comment:11 Changed 4 years ago by gkronber

  • Status changed from reviewing to readytorelease

Reviewed r8805, r8803, r8786.

comment:12 Changed 4 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.7 to 3.3.8
Note: See TracTickets for help on using tickets.