Opened 6 years ago

Closed 6 years ago

#1640 closed enhancement (done)

Refactor datasets to allow the storage of strings and datetimes

Reported by: mkommend Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.6
Component: Problems.DataAnalysis Version: 3.3.6
Keywords: Cc:

Description

Currently only double values can be stored in the Dataset. Although these are the only values that can be used for modeling, string and DateTime values are useful for information purpose.

Change History (15)

comment:1 Changed 6 years ago by mkommend

  • Status changed from new to accepted

r6740:

  • Corrected TableFileParser to handle empty rows correctly.
  • Refactored DataSet to store values in List<List> instead of a two-dimensional array.
  • Enable importing and storing string and datetime values.
  • Changed data access methods in dataset and adapted all concerning classes.
  • Changed interpreter to store the variable values for all rows during the compilation step.

comment:2 Changed 6 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to assigned

The SymbolicDataAnalysisExpressionTreeILEmittingInterpreter must be adapted to work with r6740 correctly.

comment:3 Changed 6 years ago by gkronber

  • Status changed from assigned to accepted

comment:4 Changed 6 years ago by gkronber

r6741: adapted IL emitting interpreter to work with previous (r6740) changes of the dataset.

comment:5 Changed 6 years ago by gkronber

  • Status changed from accepted to reviewing

comment:6 Changed 6 years ago by mkommend

r6749: Added Storable attribute to dataset values.

comment:7 Changed 6 years ago by gkronber

r6754: fixed a problem with empty trainingindizes when using cross-validation

comment:8 Changed 6 years ago by gkronber

r6769: fixed a bug in the interpretation of lagged variables introduced with r6740.

comment:9 Changed 6 years ago by gkronber

Importing data does not work reliably. In the TableFileParser lines 100 - 122 first the type of a column is determined heuristically and then the parsed values are filled into the columns.

First of all the statement in line 100:

var columnType = types.GroupBy(v => v).OrderBy(v => v).Last().Key;

throws an exception because it is not possible to use an enumerable as key. Possibly OrderBy(v=>v.Count()) was meant?

The next problem is that first the type of column is determined by the majority of parsed elements for that column, however, later the elements are simply pushed into the columns without checking if the type is compatible. This throws an exception again.

comment:10 Changed 6 years ago by gkronber

  • Status changed from reviewing to assigned

comment:11 Changed 6 years ago by gkronber

  • Status changed from assigned to accepted

comment:12 Changed 6 years ago by gkronber

r6776 fixed a bug in parsing datetime values and improved code for filling dataset columns

comment:13 Changed 6 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:14 Changed 6 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

Reviewed and tested r6741, r6754, r6769 and r6776.

comment:15 Changed 6 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.5 to 3.3.6
Note: See TracTickets for help on using tickets.