#2809 closed enhancement (done)

Remove "Logic" Classes in DataPreprocessing

Reported by: pfleck Owned by: pfleck
Priority: medium Milestone: HeuristicLab 3.3.15
Component: DataPreprocessing Version: trunk
Keywords: Cc:

Description

The "Logics" in the DataPreprocessing encapsulates different aspects of how the DataPreprocessingData can be accessed and manipulated. However, the "Logic"-classes mostly contain some helper methods that are only used within a single corresponding view. The logic classes should be removed and the code moved to the corresponding views.

By removing the "Logics", the initialization of the DataPreprocessingView should be simplified to avoid the highly coupled and entangled object initialization. Generally, the overall design of the data preprocessing should be more like the rest of HL.

Change History (25)

comment:1 Changed 17 months ago by pfleck

  • Status changed from new to accepted

comment:2 Changed 17 months ago by pfleck

r15268 Added branch with the DataPreprocessing plugins.

comment:3 Changed 17 months ago by pfleck

r15269: Removed SearchLogic

comment:4 Changed 17 months ago by pfleck

r15270: Removed TransactionalPreprocessingData and moved relevant code to PreprocessingData.

comment:5 Changed 17 months ago by pfleck

r15274:

  • Removed FilterLogic.
  • Made Contents storable and implemented proper cloning.

comment:6 Changed 17 months ago by pfleck

r15283: Removed StatisticsLogic.

comment:7 Changed 17 months ago by pfleck

r15285 removed ManipulationLogic

  • Moved "ReplaceByAverage/Random/..." to DataGridContent only (functions are redundant in manipulation view)
    • Refactored some code for replacing values with ...
  • Removed Smoothing (not very usefull and changes the data incomprehensibly)
  • Moved Shuffling to DataGridContent
  • Moved "Delete Rows/Columns with insufficient ..." to ManipulationContent
  • Removed Logic Folder
    • Moved Filter folder out

comment:8 Changed 17 months ago by pfleck

r15291: Added (Double/String/DateTime)PreprocessingDataColumn. (experimental state)

comment:9 Changed 17 months ago by pfleck

r15309 Worked on type-save PreprocessingDataColumns.

comment:10 Changed 14 months ago by pfleck

r15431: Removed experimental static-typed datacolumns. (reverse merge g15291, r15309)

comment:11 Changed 14 months ago by pfleck

  • Owner changed from pfleck to bburlacu
  • Status changed from accepted to reviewing

comment:12 Changed 14 months ago by bburlacu

  • Owner changed from bburlacu to pfleck
  • Status changed from reviewing to assigned

The functionality is fine and consistent with the old behavior (tested side by side with the SARCOS problem data).

Some nitpicks:

  • FilteredPreprocessingData.cs:
    • Line 45: typo "preporcessingData"
  • Filtering logic seems unnecessarily complicated
  • FilterView.cs:
    • Line 86: this piece of code is a little misleading and the whole logic in my opinion counter-intuitive

result1[row] = isAndCombination ? result1[row] || filterResult[row] : result1[row] && filterResult[row];

  • the actual result is not the rows for which the filter combination is valid, but the rows for which the filter combination is NOT valid (and these rows get deleted in the SetFilter() method)
  • i think the intent should be more clear and the result explicitly negated in order to get the rows which should be deleted
  • names like "result1" should be more clear (eg., rowsToDelete, filteredRows)

comment:13 Changed 14 months ago by mkommend

  • Milestone changed from HeuristicLab 3.3.16 to HeuristicLab 3.3.15

comment:14 Changed 13 months ago by pfleck

  • Status changed from assigned to accepted

comment:15 Changed 13 months ago by pfleck

r15466: Simplified the overall filtering logic as suggested by bburlacu

  • changed parameter names to actively reflect that filter means "remaining"
  • moved filter combination logic to FilterContent
  • simplified/restructured code

comment:16 Changed 13 months ago by pfleck

  • Owner changed from pfleck to bburlacu
  • Status changed from accepted to reviewing

comment:17 Changed 13 months ago by bburlacu

  • Owner changed from bburlacu to pfleck
  • Status changed from reviewing to assigned
  • unchecking a filter results in the wrong number of remaining and total rows displayed in the filter preview
  • filtering is quite slow (takes ~7 seconds on my machine to remove 15k rows after filtering) - this problem also exists in the trunk version of the DataPreprocessing)

EDIT: The slowness is caused by the calls to IList.RemoveAt() which is called in succession for each row that needs to be removed inside the PreprocessingData.DeleteRow() method. Simply replacing the original values with the filtered values (instead of by row removal) results in a massive speed improvement.

Last edited 13 months ago by bburlacu (previous) (diff)

comment:18 Changed 13 months ago by pfleck

  • Status changed from assigned to accepted

comment:19 Changed 13 months ago by pfleck

r15489

  • Fixed wrong number of remaining/toral rows after filter changed
  • Improved performance for filtering as suggested by bburlacu. Thanks for the patch.
  • Start all charts with index zero.

comment:20 Changed 13 months ago by pfleck

  • Owner changed from pfleck to bburlacu
  • Status changed from accepted to reviewing

comment:21 Changed 12 months ago by pfleck

  • Owner changed from bburlacu to pfleck
  • Version changed from branch to trunk

r15518: merged branch to trunk

comment:22 Changed 12 months ago by pfleck

r15534: Fixed a potential InvalidOperationException when applying filters

comment:23 Changed 12 months ago by pfleck

  • Status changed from reviewing to readytorelease

comment:24 Changed 12 months ago by pfleck

r15535: merged r15518,r15534 to stable

comment:25 Changed 12 months ago by pfleck

  • Resolution set to done
  • Status changed from readytorelease to closed

r15536 deleted branch

Note: See TracTickets for help on using tickets.