Opened 2 years ago

Closed 2 years ago

#2335 closed enhancement (done)

Data Preprocessing Improvements

Reported by: ehopf Owned by: mkommend
Priority: medium Milestone: HeuristicLab 3.3.12
Component: DataPreprocessing Version: 3.3.12
Keywords: Cc:

Description (last modified by ehopf)

Various data preprocessing improvements especially for data sets with many missing values. Furthermore this includes fixes for found defects in the data preprocessing components.

Defects

(1) Problem data generation fails with an Exception, if a user deletes one or more columns. (2) The generated classification problem data doesn´t keep the user defined positive class. (3) Editing a cell in the DataGrid-View leads to an Exception. (4) Exception if a user changes to the Histogram-View after he deletes a column (only occurs if the user was already in the Histogram-View before the deletion). (5) Manipulation-View: Wrong column count in the preview of "Delete Columns with insufficient Information" and "Delete Columns with insufficient Variance". (6) The Statistics-View doesn´t update the statistics if a Filter is applied to the data. (7) Deactive Export Problem Button. (8) Unexptected row delete behavior if the data is sorted (Datagrid-View). (9) An Exception occurs if the user deletes more than half of the rows in the DataGrid-View.

Features

(1) An option to display missing values within the Histogram view. (2) A possibility to delete columns in the DataGrid view. (3) Display Missing value information in the statistics available in the DataGrid view.

Change History (42)

comment:1 Changed 2 years ago by ehopf

  • Version changed from 3.3.11 to branch

comment:2 Changed 2 years ago by ehopf

  • Status changed from new to accepted

comment:3 Changed 2 years ago by ehopf

r12051: Branched HL.DataPreprocessing to implement improvements

comment:4 Changed 2 years ago by ehopf

r12052: created the HL.DataPreprocessing project subfolder in the DataPreprocessing branch

comment:5 Changed 2 years ago by ehopf

r12053: moved the HL.DataPreprocessing project to the appropriate subfolder

comment:6 Changed 2 years ago by ehopf

r12054: Branched HeuristicLab.DataPreprocessing.Views to implement the improvements

comment:7 Changed 2 years ago by ehopf

r12056: Created a solution file and adjusted the project settings of the DataPreprocessing Branch

comment:8 Changed 2 years ago by ehopf

r12058: Changed the logic of the ProblemDataCreator to avoid the access of a column that was deleted by the user before. Fixes Defect 1.

Last edited 2 years ago by ehopf (previous) (diff)

comment:9 Changed 2 years ago by ehopf

  • Description modified (diff)

comment:10 Changed 2 years ago by ehopf

r12062: Merged r12059 into branch.

comment:11 Changed 2 years ago by ehopf

  • Description modified (diff)

comment:12 Changed 2 years ago by ehopf

r12063: Added code to keep the user defined positive class in the generated problem data of a classification problem. (Defect 2 fix)

comment:13 Changed 2 years ago by ehopf

  • Description modified (diff)

comment:14 Changed 2 years ago by ehopf

r12158: Added a possibility to delete columns in the DataGrid view. (Feature 2)

comment:15 Changed 2 years ago by ehopf

r12160: Branched HL.Data.Views to implement minor improvements to the StringConvertibleMatrixView-class.

comment:16 Changed 2 years ago by ehopf

r12161: Adjusted the project settings of the DataPreprocessing Branch.

comment:17 Changed 2 years ago by ehopf

r12164: Adjusted the project settings of the DataPreprocessing Branch.

comment:18 Changed 2 years ago by ehopf

r12165: Encapsulated sort column and statistics generation behavior in StringConvertibleMatrixView.cs. Additionally fixed the statistic measures regarding missing values.

comment:19 Changed 2 years ago by ehopf

r12167: Swapped column left- and rightclick behavior in the DataGrid view, to be consistent with the row selection behavior.

comment:20 Changed 2 years ago by ehopf

r12168: Fixes a problem with r12167, in case only missing values are selected.

comment:21 Changed 2 years ago by ehopf

r12169: Added an option to display the missing value count within the Histogram view. (Feature 1)

comment:22 Changed 2 years ago by ehopf

  • Description modified (diff)

comment:23 Changed 2 years ago by ehopf

r12500: Added code that fixes Defect 3 and prevents an endless loop.

comment:24 Changed 2 years ago by ehopf

r12501: Adjusted the code of the Histogram-View to prevent an outdated CheckedItemList after the deletion of a column (Defect 4).

comment:25 Changed 2 years ago by ehopf

r12502: Corrected the preview column count in the Manipulation-View (Defect 5) and added some additional information.

comment:26 Changed 2 years ago by ehopf

  • Description modified (diff)

comment:27 Changed 2 years ago by ehopf

  • Description modified (diff)

comment:28 Changed 2 years ago by ehopf

r12543: The correct row will get deleted now if the data is sorted (Defect 8).

Last edited 2 years ago by ehopf (previous) (diff)

comment:29 Changed 2 years ago by ehopf

r12544: Minor fix that activates the Export Problem Button (Defect 7).

comment:30 Changed 2 years ago by ehopf

r12545: Added the recalculation of the Statistics-View if a Filter gets applied (Defect 6) and minor fixes to the statistic calculations.

comment:31 Changed 2 years ago by ehopf

  • Description modified (diff)

comment:32 Changed 2 years ago by ehopf

r12555: Removed a redundant check and fixed a validation problem in the Datagrid-View (Defect 9). Additionally changed the validation in the Manipulation-View to disallow the thousands separator as input. This prevents the unintended usage of the thousands separator as comma which would result in a wrong result.

comment:33 Changed 2 years ago by ehopf

r12633: Merged trunk into DataPreprocessingImprovements-Branch.

comment:34 Changed 2 years ago by ehopf

  • Owner changed from ehopf to mkommend
  • Status changed from accepted to reviewing

comment:35 Changed 2 years ago by mkommend

  • Version changed from branch to 3.3.12

r12676: Merged changes into trunk.

comment:36 Changed 2 years ago by mkommend

r12677: Deleted DataPreprocessingImprovements branch.

comment:37 Changed 2 years ago by mkommend

  • Owner changed from mkommend to gkronber

In my opinion the changes are ready for release.

comment:38 Changed 2 years ago by mkommend

r12682: Corrected accidentally merged project files.

comment:39 Changed 2 years ago by mkommend

r12683: Corrected accidentally merged project files (second try).

comment:40 Changed 2 years ago by gkronber

I didn't make a thorough review of all changes but I tested the data preprocessing view a little bit and nothing strange occurred. So I guess we can merge this to stable.

comment:41 Changed 2 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from reviewing to readytorelease

comment:42 Changed 2 years ago by mkommend

  • Resolution set to done
  • Status changed from readytorelease to closed

r12718: Merged all changes into stable.

Note: See TracTickets for help on using tickets.