Opened 11 years ago

Closed 11 years ago

#1640 closed enhancement (done)

Refactor datasets to allow the storage of strings and datetimes

Reported by: mkommend Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.6
Component: Problems.DataAnalysis Version: 3.3.6
Keywords: Cc:

Description

Currently only double values can be stored in the Dataset. Although these are the only values that can be used for modeling, string and DateTime values are useful for information purpose.

Change History (15)

comment:1 Changed 11 years ago by mkommend

  • Status changed from new to accepted

r6740:

  • Corrected TableFileParser to handle empty rows correctly.
  • Refactored DataSet to store values in List<List> instead of a two-dimensional array.
  • Enable importing and storing string and datetime values.
  • Changed data access methods in dataset and adapted all concerning classes.
  • Changed interpreter to store the variable values for all rows during the compilation step.

comment:2 Changed 11 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from accepted to assigned

The SymbolicDataAnalysisExpressionTreeILEmittingInterpreter must be adapted to work with r6740 correctly.

comment:3 Changed 11 years ago by gkronber

  • Status changed from assigned to accepted

comment:4 Changed 11 years ago by gkronber

r6741: adapted IL emitting interpreter to work with previous (r6740) changes of the dataset.

comment:5 Changed 11 years ago by gkronber

  • Status changed from accepted to reviewing

comment:6 Changed 11 years ago by mkommend

r6749: Added Storable attribute to dataset values.

comment:7 Changed 11 years ago by gkronber

r6754: fixed a problem with empty trainingindizes when using cross-validation

comment:8 Changed 11 years ago by gkronber

r6769: fixed a bug in the interpretation of lagged variables introduced with r6740.

comment:9 Changed 11 years ago by gkronber

Importing data does not work reliably. In the TableFileParser lines 100 - 122 first the type of a column is determined heuristically and then the parsed values are filled into the columns.

First of all the statement in line 100:

var columnType = types.GroupBy(v => v).OrderBy(v => v).Last().Key;

throws an exception because it is not possible to use an enumerable as key. Possibly OrderBy(v=>v.Count()) was meant?

The next problem is that first the type of column is determined by the majority of parsed elements for that column, however, later the elements are simply pushed into the columns without checking if the type is compatible. This throws an exception again.

comment:10 Changed 11 years ago by gkronber

  • Status changed from reviewing to assigned

comment:11 Changed 11 years ago by gkronber

  • Status changed from assigned to accepted

comment:12 Changed 11 years ago by gkronber

r6776 fixed a bug in parsing datetime values and improved code for filling dataset columns

comment:13 Changed 11 years ago by gkronber

  • Owner changed from gkronber to mkommend
  • Status changed from accepted to reviewing

comment:14 Changed 11 years ago by mkommend

  • Owner changed from mkommend to gkronber
  • Status changed from reviewing to readytorelease

Reviewed and tested r6741, r6754, r6769 and r6776.

comment:15 Changed 11 years ago by swagner

  • Resolution set to done
  • Status changed from readytorelease to closed
  • Version changed from 3.3.5 to 3.3.6
Note: See TracTickets for help on using tickets.