Opened 10 months ago

Closed 2 months ago

#2709 closed enhancement (done)

DataPreprocessing Enhancements

Reported by: pfleck Owned by: pfleck
Priority: medium Milestone: HeuristicLab 3.3.15
Component: DataPreprocessing.Views Version: 3.3.14
Keywords: Cc:

Description (last modified by pfleck)

This ticket contains smaller visual enhancements of the preprocessing views.

This ticket depends on #2698.

  • Multi-Scatterplot changes

This ticket depends on #2713.

  • moved DataTable/ScatterPlotControl out of DataTable/ScatterPlotView
  • introduced regression curves in scatterplot

This ticket depends on #2715.

  • introduce Histogram aggregation

Enhancements

  • ViewHost/ViewShortcut usage
    • Remove ViewHost icons for the ViewShortcuts
      • Split Single- and Multi-Scatterplot
    • Remove “View Shortcuts” grouping box
    • Doubleclick a ViewShortcut should not reset the state in the new view (not possible right now because the state of the views is partially located in the views, not the contents. will be fixed in the future)
  • PreprocessingCheckedItemView
    • Hide Move, Add, Delete buttons
    • Add Select Input/Target, All and None as Buttons (checkboxes with tooltips)
      • Remove context menu instead
  • DataGrid + Statistics
    • Show/Hide columns and rows
      • Check All, Input/Target, None Variables option
      • Initially only Input/Target variables should be checked
  • DataCompletnessChart
    • Remove title
    • Move legend to top (column style)
  • Scatterplot
    • Better default axis ranges (Bogdans helper functions)
    • Axis description instead of legend
    • Manual axis-range (also for linechart) (currently via config dialog)
  • MultiScatterplot
    • X-axis labels vertical

New features

  • Distinguish Color and Grouping option in scatterplot
    • Current “Color” feature becomes “Grouping”
    • “Color” should be possible for all features, using the Color Gradient
  • Scatterplot
    • Add slider for changing point size (currently via config dialog)
    • Add regression line and add option to show/hide (implemented in #2713)
  • MultiScatterplot
    • Add (better) tooltips (are legend tooltip in #2713)
      • Add correlation coefficient to scatterplot (visible in tooltip of legend)
  • Histogram + MultiLinechart
    • Add chart size sliders (as in MultiScatterplot)
      • or column count?
  • Feature Correlation Matrix
    • Check All, Input/Target, None Variables option
  • New Button should open a “are you sure current data is deleted” dialog

Change History (79)

comment:1 Changed 10 months ago by mkommend

  • Summary changed from Preprocessing Visual Enhancements to DataPreprocessing Enhancements

comment:2 Changed 10 months ago by pfleck

  • Description modified (diff)

comment:3 Changed 10 months ago by pfleck

  • Status changed from new to accepted

r14440 created branch

r14441 Copied plugins.

comment:4 Changed 10 months ago by pfleck

r14444 reverse merged r14441 because local changes were accidentally included in the branch.

comment:5 Changed 10 months ago by pfleck

r14445 Branched DataPreprocessing plugins. Adapted build paths and references.

comment:6 Changed 10 months ago by pfleck

r14446 Removed the PreprocessingScatterPlotView and use the HL ScatterPlotControl instead.

comment:7 Changed 10 months ago by pfleck

r14459

  • Removed the PreprocessingDataTable and PreprocessingDataTableView and use dhe HL DatatTableControl instead.
  • Moved and refactored some code of PreprocessingChart and moved unnecessary code from base classes to actual derivative classes.

Some features of the PreprocessingDataTableView are included in the regular DataTableView in #2715.

comment:8 Changed 10 months ago by pfleck

  • Description modified (diff)

comment:9 Changed 10 months ago by pfleck

r14460 Fixed missing resx in csproj.

comment:10 Changed 10 months ago by pfleck

  • Description modified (diff)

r14462

  • Added a separate MultiScatterPlot entry and removed the ViewHost views-icon instead.
  • Moved legend of DataCompletenessChart to the top and removed the title instead.

comment:11 Changed 10 months ago by pfleck

  • Description modified (diff)

r14467

  • Removed some groupboxes in ViewShortcutListView.
  • Removed unnecessary IViewChartShortcut
  • Split ScatterPlot Multi and Single in to separate contents.
  • Renamed Color-combo box in Scatterplot to "Group".

comment:12 Changed 10 months ago by pfleck

r14470

  • Fixed bugs with double-click on view shortcut.
  • Reuse visual properties for single scatterplot.

comment:13 Changed 10 months ago by pfleck

  • Description modified (diff)

r14472 Better initial axis intervals for scatterplots.

comment:14 Changed 10 months ago by pfleck

r14473 Improved default y-axis for line charts.

comment:15 Changed 9 months ago by pfleck

r14474

  • Improved legend description for grouped histogram and scatterplots.
  • Fixed initial size of points for scatterplots.
  • Added correlation calculation for scatterplots (not used yet).

comment:16 Changed 9 months ago by pfleck

  • Description modified (diff)

r14495

  • Fixed initial point size for scatterplots.
  • Reuse the visual properties of the old data row if a single variable is changed in the ScatterPlotSingleView

comment:17 Changed 9 months ago by pfleck

  • Description modified (diff)

r14511

  • Added Check Inputs/All/None buttons instead of showing disabled buttons of the ItemCollectionView.
  • Removed the PreprocessingCheckedItemListView. A standard ListView is used instead.
  • Fixed slow updating when simultaneously (un-)checking multiple variables in the chart views. (currently only works by using the new buttons)

comment:18 Changed 9 months ago by pfleck

  • Description modified (diff)

r14514

  • Added a VerticalLabel for the multi-scatterplot.
  • Added regression options for single- and multi-scatterplot

comment:19 Changed 9 months ago by pfleck

  • Description modified (diff)

r14512 Added an option for the preprocessing scatterplot to use a color gradient instead of the chart color palette.

comment:20 Changed 9 months ago by pfleck

r14525

  • Added suggestion feature for singlescatterplotview.
  • Shows NaN groups in scatterplot (black if gradient is selected).
  • Only enables input variables in DataGridContentView per default.
  • Added missing resx file (gradient image).

comment:21 Changed 9 months ago by pfleck

  • Description modified (diff)

r14545

  • Uses StringMatrix for statistics instead of winforms datagrid.
  • Precheck input/target variables only for statistics.

comment:22 Changed 9 months ago by pfleck

  • Description modified (diff)

r14546 Added shortcuts for select input/all/none variables in datagrid and statistics.

Last edited 9 months ago by pfleck (previous) (diff)

comment:23 Changed 9 months ago by pfleck

  • Owner changed from pfleck to mkommend
  • Status changed from accepted to reviewing

comment:24 Changed 8 months ago by pfleck

r14578 Fixed wrongly positioned options in histogram view.

comment:25 Changed 8 months ago by mkommend

r14579: Refactored histogram view and content to support grouping by string and datetime variables.

comment:26 Changed 8 months ago by mkommend

r14580: Changed initialization of caches to avoid NullReferenceExceptions.

comment:27 Changed 8 months ago by mkommend

r14581: Refactored get variables for grouping (extracted method to another class).

comment:28 Changed 8 months ago by pfleck

r14583

  • Added histogram aggregation option.
  • Show all columns in data grid per default.

comment:29 Changed 7 months ago by mkommend

r14723: Updated branch with most recent trunk changes.

comment:30 Changed 7 months ago by mkommend

Testing

  • View shortcuts should have more descriptive names and use spaces instead of camel case, for example "Line chart" instead of "LineChart".
  • All multi XXX chart should support opening an individual chart in a new tab by double clicking them
  • Data grid
    • What is the point of showing no variables? Especialle because the show column context menu cannot be opened anymore.
    • Spacing between row / column count & action button should be the same as for action buttons & the show variables
    • Show Variables GroupBox just as label or centered. Currently it looks slightly odd.
  • Statistics
    • Horizontally listed columns look much better.
    • However, would it be possible to configure the direction (horizontally vs vertically)
    • Show Variables GroupBox should be layouted vertically to use the available space better.
    • The datagrid shows per default all columns, whereas statistics only show the inputs + target. Per default all variables should be shown in the data grid and statistics view, but non-inputs should be highlighted maybe italic.
  • Line chart
    • Check and uncheck all variables have unintuitive icons. Can't you use a checked and unchecked box? (Applies to the histogram as well).
    • Reuse the icons for the data grid and statistics as well?
    • Size / Column count slider is missing. (Applies to the histogram as well).
  • Histogram
    • Title font is increased when enable grouping.
    • Aggregation options are pretty cool.
    • There should be an option to order the legend alphabetically instead of based on the occurance in the data(comment:47).
  • Scatter plot
    • It should be possible to change the point size and transparency of the data points. (Applies to the multi scatter plot as well).(comment:45)
    • More reasonable default text size.
  • Multi Scatter plot
    • It should only have one size slider instead of two separate ones for width and height

Review Comments

  • Chart classes should be sealed and members should be private (e.g. LineChartView). (will be done in a separate ticket on general DataPreprocessing architecture overhaul)
  • Commented code should be removed (PreprocessingChartView).(comment:50)
  • Remove resx files (ScatterPlotSingleView)(resx in ScatterPlotSingleView contains the gradient image)
Last edited 4 months ago by pfleck (previous) (diff)

comment:31 Changed 7 months ago by mkommend

  • Status changed from reviewing to assigned

comment:32 Changed 7 months ago by mkommend

  • Status changed from assigned to accepted

comment:33 Changed 7 months ago by mkommend

r14724: Adapted data preprocessing scatter plot to allow grouping of string variables.

comment:34 Changed 7 months ago by mkommend

  • Owner changed from mkommend to pfleck
  • Status changed from accepted to assigned

r14725: Added grouping for multi scatter plot view.

comment:35 Changed 5 months ago by pfleck

  • Status changed from assigned to accepted

comment:36 Changed 5 months ago by pfleck

  • Description modified (diff)

r14902

  • Changed chart sizing to absolute values (pixels).
  • Added chart sizing to Linechart and Histogram.

comment:37 Changed 5 months ago by pfleck

  • Description modified (diff)

comment:38 Changed 5 months ago by pfleck

r14903

  • Added warning when creating a new regression/classification that data will be lost.
  • Renamed view shortcuts to have a more descriptive name instead of the camel casing.
  • Added missing license header.

comment:39 Changed 5 months ago by pfleck

r14915

  • Added Check All/Inputs&Target/None Icons.
  • Improved location and formatting of the "Show Variables" groupbox in datagrid and statistics view.
  • Added an "Orientation" option for the statistics view.

comment:40 Changed 5 months ago by pfleck

r14917

  • Use the new icons for PreprocessingCheckedVariablesView (linechart, histogram).
  • Added a "lock aspect ratio" sizing for the multi scatter plot.
  • Fixed a bug in single scatter plot when changing the regression line.

comment:41 Changed 5 months ago by pfleck

  • Description modified (diff)

comment:42 Changed 5 months ago by pfleck

  • Description modified (diff)

comment:43 Changed 5 months ago by pfleck

Review comments from 2698#comment:11

  • Maximum of 20 variables should be selected by default. (comment:45)
  • All controls should be displayed (variable check box, slider, ...) before the charts are drawn (asynchronously?).
  • One slider for the chart size is sufficient. It is quite cumbersome to handle two sliders.(r14917) The point size should also be adapted when the chart size changes.
  • Possible leak (memory, window handles). Removed controls are not disposed.(comment:44)
  • Inserting and removing charts is quite ugly and pretty slow. Another possibility would be to set the column / row width to 0. Maybe that is a "better" solution. (comment:45)
Last edited 4 months ago by pfleck (previous) (diff)

comment:44 Changed 5 months ago by pfleck

r14953 Disposed dynamically created controls.

comment:45 Changed 4 months ago by pfleck

r14975

  • Improved Check/Uncheck of variables.
    • Instead of removing whole columns/rows from the tablelayout, the tablelayout stays the same with the column/rows width/height set to zero.
    • Hidden charts are not updated to avoid unnessecary calculations.
  • Added a check (messagebox) if >20 variables should be displayed in the multi scatterplot or reduced to 20.
  • Added configuration for point size/opacity and (histogram)aggregation.

comment:46 Changed 4 months ago by pfleck

r14983 Adapted DataTable/ScatterPlotControl to the recent (re-) merge of the -View and -Control.

comment:47 Changed 4 months ago by pfleck

r14993

  • Added Legend order when grouping for histogram and (single and multi)scatterplot.
  • Removed the limitation of distinct values for the singlescatterplot (for the color gradient).
  • Added a legend-visible checkbox for the multi-scatterplot.

comment:48 Changed 4 months ago by pfleck

  • Description modified (diff)

r14994 Added Check All/Inputs/None Buttons for the feature correlation view.

comment:49 Changed 4 months ago by pfleck

reviewed r14579, r14580, r14581, r14724 and r14725.

Looks good, especially that the code for creating a single chart (CreateHistrogram/Scatterplot) was moved into their respective Contents.

comment:50 Changed 4 months ago by pfleck

r14996

  • Fixed initial selection of the grouping text box (empty string instead of null to select the first entry).
  • General code fixes (removed unnessecary bank lines and code, class member order, ...)

comment:51 Changed 4 months ago by pfleck

  • Owner changed from pfleck to mkommend
  • Status changed from accepted to reviewing

comment:52 Changed 4 months ago by pfleck

  • Owner changed from mkommend to pfleck
  • Status changed from reviewing to assigned

comment:53 Changed 4 months ago by pfleck

  • Status changed from assigned to accepted

r15011 forgot to commit designer file (changed control name)

comment:54 Changed 4 months ago by pfleck

  • Owner changed from pfleck to mkommend
  • Status changed from accepted to reviewing

r15012 Merged trunk changes to branch.

comment:55 Changed 4 months ago by mkommend

Review comments

  • Images for shown columns are confusing(comment:58)
  • Data Grid
  • Statistics
    • columns -> variables, features?(comment:57)
    • rows -> count, values?(comment:57)
    • Show Variables/orientation group box should be vertically aligned with the overview(comment:57)
  • Scatter Plot
    • Use combo boxes for x and y so that the values are not writable(comment:57)
  • Multi Scatter Plot
    • Spacing in chart size group box(comment:57)
    • Improve error message if more than 20 variables are displayed(comment:58)
  • Histogram
    • Title font is too large (only noticeable when grouping is enabled)(comment:58)

Source is reviewed.

Last edited 4 months ago by pfleck (previous) (diff)

comment:56 Changed 4 months ago by mkommend

  • Owner changed from mkommend to pfleck
  • Status changed from reviewing to assigned

comment:57 Changed 4 months ago by pfleck

  • Status changed from assigned to accepted

r15019

  • renamed Column -> Variable, Row -> Datarow.
  • scatterplot using regular comboboxes for variables.
  • adapted sizing and small layouting in multiscatterplot, histogram, statistics and datagrid.
Last edited 4 months ago by pfleck (previous) (diff)

comment:58 Changed 4 months ago by pfleck

r15021

  • New Icons for Check All, Inputs&Target, None
  • Smaller titlefont for histograms.
  • Changed warning for multiscatterplot.

comment:59 Changed 4 months ago by pfleck

  • Owner changed from pfleck to mkommend
  • Status changed from accepted to reviewing

comment:60 Changed 4 months ago by pfleck

r15027 Fixed an issue with empty charts.

comment:61 Changed 4 months ago by pfleck

r15028 Changed access modifier of some controls to prevent the designer to generate code that can break the layout.

comment:62 Changed 3 months ago by pfleck

r15036

  • Used title for showing variable name in the histogram instead of the legend. Legend is now used for grouping only.
  • Fixed Variables/Datarows in StatisticsView (were switched).
  • Improved the "Warning Dialog" for the MultiScatterPlot when too many variables might be shown.
    • Option for checking "None".
    • Show the dialog before the charts are calculated internally.

r15037 added license region (got deleted somehow)

comment:63 Changed 3 months ago by mkommend

Reviewed all changesets.

comment:64 Changed 3 months ago by mkommend

  • Owner changed from mkommend to pfleck
  • Status changed from reviewing to assigned

Please ensure that the mentioned tickets (#2698, #2713, #2715) are reviewed and released. If that is the case merge the branch into the trunk and remove the branch.

comment:65 Changed 3 months ago by pfleck

r15041 Fixed an issue when doubleclicking the scatterplot (error in re-layouting the color gradient).

comment:66 Changed 3 months ago by pfleck

r15043 Adapted to recent trunk change (r15042)

comment:67 Changed 3 months ago by pfleck

r15090 Adapted to recent trunk changes (r15068)

comment:68 Changed 3 months ago by pfleck

r15101 Merged recent trunk changes

comment:69 Changed 3 months ago by pfleck

r15110: merged branch to trunk

comment:70 Changed 3 months ago by gkronber

  • Version changed from branch to 3.3.14

comment:71 Changed 3 months ago by pfleck

r15119

  • fixed combobox behavior in single scatterplot view
  • made PreprocessingCheckedVariablesView abstract to fix unit test

comment:72 Changed 2 months ago by pfleck

r15210 Fixed some small issues and some default behavior.

  • DataGrid
    • Statistics Overview (at bottom) fixed
  • FilterView
    • Apply Button bottom always visible
  • LegendOrder Alphabetically as default and first in the combobox
  • PreprocessingChartView
    • Column width is now updated when the window resizes. This also fixes the issues that a double-click (to show in new tab) showed no charts (because they were too thin due to the missing resizing)
  • PreprocessingIcons
    • Fixed case (in)sensivity of the image paths (for mono build)
  • SingleScatterPlotView
    • Limited the max distinct values for grouping to 50
  • StatisticsView
    • Vertical per default
    • Resized some textboxes
Last edited 2 months ago by mkommend (previous) (diff)

comment:73 Changed 2 months ago by pfleck

  • Owner changed from pfleck to mkommend
  • Status changed from assigned to reviewing

comment:74 Changed 2 months ago by mkommend

  • Status changed from reviewing to readytorelease

Reviewed r15210 and everything works as described.

comment:75 Changed 2 months ago by mkommend

  • Status changed from readytorelease to reviewing

comment:76 Changed 2 months ago by mkommend

  • Owner changed from mkommend to pfleck
  • Status changed from reviewing to readytorelease

comment:77 Changed 2 months ago by pfleck

r15242 merged to stable

comment:78 Changed 2 months ago by pfleck

r15243 deleted branch

comment:79 Changed 2 months ago by pfleck

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.