#2859 closed defect (done)

Improve TableFileParser column-type deduction for missing values

Reported by: pfleck Owned by: pfleck
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Problems.DataAnalysis Version: trunk
Keywords: Cc:

Description

A DateTime-column, where the first value is missing, is wrongly interpreted as string-column.

Column-type deduction of the TableFileParser currently works in the following way:

  1. The initial type of a column based on the values of the first data-row.
  2. If the first value is missing, the column is a double-column.
  3. If a new value is not compatible with the current column-type, the column is converted to a string-column.

In case of a DateTime with the first value missing, the parser first decides it is a double-column and later converts it to a string-column when the first DateTime-value appears.

The TableFileParser should be improved, so that the column-type deduction is deferred until the first non-missing value appears.

Attachments (1)

TableFileParser.cs.patch (5.1 KB) - added by pfleck 13 months ago.

Download all attachments as: .zip

Change History (9)

Changed 13 months ago by pfleck

comment:1 Changed 13 months ago by pfleck

  • Status changed from new to accepted

Added Patch that treats columns of unknown types as List<object> and performs conversion to a specific list-type when the first non-missing value occurs for that column.

comment:2 Changed 13 months ago by pfleck

  • Owner changed from pfleck to gkronber
  • Status changed from accepted to reviewing

comment:3 Changed 12 months ago by gkronber

  • Owner changed from gkronber to pfleck
  • Status changed from reviewing to assigned

Reviewed the patch. Looks good.

comment:4 Changed 12 months ago by pfleck

  • Status changed from assigned to accepted
  • Version set to trunk

r15513 Fixed problem by temporarily using a List<object> to represent an unknown column-type until the type is known.

comment:5 Changed 12 months ago by pfleck

  • Milestone changed from HeuristicLab 3.3.16 to HeuristicLab 3.3.15

comment:6 Changed 12 months ago by pfleck

  • Status changed from accepted to readytorelease

comment:7 Changed 12 months ago by pfleck

r15537: merged r15513 to stable

comment:8 Changed 12 months ago by pfleck

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.