Free cookie consent management tool by TermsFeed Policy Generator

Opened 6 years ago

Closed 6 years ago

#2902 closed enhancement (done)

Performance improvement of NaN/Inf-check on a double-matrix

Reported by: fholzing Owned by: gkronber
Priority: low Milestone: HeuristicLab 3.3.16
Component: Algorithms.DataAnalysis Version: trunk
Keywords: Performance Cc:

Description (last modified by fholzing)

The current implementation to check whether a double[,] contains any Nan/Inf-values or not is rather time consuming.

inputMatrix.Cast<double>().Any(...) At first glance it takes about 6 seconds for a 50000x500 matrix. After consulting mkommend, a faster alternative would be preferred.

Attachments (1)

Performance_Release.xlsx (22.0 KB) - added by fholzing 6 years ago.
Performance comparison

Download all attachments as: .zip

Change History (22)

comment:1 Changed 6 years ago by fholzing

  • Type changed from defect to enhancement

comment:2 Changed 6 years ago by fholzing

  • Status changed from new to accepted

comment:3 Changed 6 years ago by fholzing

  • Description modified (diff)

comment:4 Changed 6 years ago by fholzing

There is a total of 11 occurrences in the HeuristicLab.Algorithms.DataAnalysis-project.

Changed 6 years ago by fholzing

Performance comparison

comment:5 Changed 6 years ago by fholzing

After a first benchmark run the difference between the current approach (mentioned in the description) and three alternatives seems to be approximately a factor of 50.

comment:6 Changed 6 years ago by fholzing

Additional note of importance: if you are ever in need of a BIG array, you have to enable gcallowverylargeobjects (see https://msdn.microsoft.com/de-de/library/hh285054(v=vs.110).aspx)

comment:7 Changed 6 years ago by fholzing

My approach would be the following: Extend the class ObjectExtensions (Common-Project) with the Iterator-Method (as shown in the .xlsx, imho the easiest/most readable one, with very little performance-impact) and use the new extension method for all 11 occurrences.

comment:8 Changed 6 years ago by gkronber

Ok, go for it but please use double.IsNaN instead of Double.IsNaN.

It would be great to know why there is such a big difference between the current code (Cast<double>) and the iterator.

comment:9 Changed 6 years ago by mkommend

I suspect that it either has to do with the lambda call or more likely with boxing involved, because the Cast extension method is defined on IEnumerable instead of IEnumerable<double>.

comment:10 Changed 6 years ago by gkronber

Aha, boxing seems to be a good explanation.

comment:11 Changed 6 years ago by fholzing

r15783: Changed from Cast to Iterator and adapted all occurrences.

comment:12 Changed 6 years ago by fholzing

FieldTest:
Random Forest Regression (Generated Testbed, 50000x500, Test/Train Datapartitions chosen with 0-20000 / 20000-40000)
M: 0,5
NoTree: 200
R: 0,2

Performance for Variable Impact View
Before optimization: ~1323 sec
After optimization: ~ 762 sec
Without check: ~ 749 sec

comment:13 Changed 6 years ago by fholzing

  • Owner changed from fholzing to mkommend
  • Status changed from accepted to reviewing

comment:14 Changed 6 years ago by gkronber

  • Owner changed from mkommend to fholzing
  • Status changed from reviewing to assigned

Reviewed r15783.

  • I found no other references to .Cast<double>, so this is fine.
  • I would prefer the name .ContainsNanOrInfinity()

comment:15 Changed 6 years ago by fholzing

r15786: Renamed ContainsNanInf to ContainsNanOrInfinity

comment:16 Changed 6 years ago by fholzing

  • Owner changed from fholzing to mkommend
  • Status changed from assigned to reviewing

comment:17 Changed 6 years ago by fholzing

  • Owner changed from mkommend to gkronber

comment:18 Changed 6 years ago by gkronber

Reviewed r15786

comment:19 Changed 6 years ago by gkronber

  • Status changed from reviewing to readytorelease

comment:20 Changed 6 years ago by gkronber

r15788: merged r15783 and r15786 from trunk to stable

comment:21 Changed 6 years ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.