Opened 7 years ago
Closed 7 years ago
#2859 closed defect (done)
Improve TableFileParser column-type deduction for missing values
Reported by: | pfleck | Owned by: | pfleck |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.15 |
Component: | Problems.DataAnalysis | Version: | trunk |
Keywords: | Cc: |
Description
A DateTime-column, where the first value is missing, is wrongly interpreted as string-column.
Column-type deduction of the TableFileParser currently works in the following way:
- The initial type of a column based on the values of the first data-row.
- If the first value is missing, the column is a double-column.
- If a new value is not compatible with the current column-type, the column is converted to a string-column.
In case of a DateTime with the first value missing, the parser first decides it is a double-column and later converts it to a string-column when the first DateTime-value appears.
The TableFileParser should be improved, so that the column-type deduction is deferred until the first non-missing value appears.
Attachments (1)
Change History (9)
Changed 7 years ago by pfleck
comment:1 Changed 7 years ago by pfleck
- Status changed from new to accepted
comment:2 Changed 7 years ago by pfleck
- Owner changed from pfleck to gkronber
- Status changed from accepted to reviewing
comment:3 Changed 7 years ago by gkronber
- Owner changed from gkronber to pfleck
- Status changed from reviewing to assigned
Reviewed the patch. Looks good.
comment:4 Changed 7 years ago by pfleck
- Status changed from assigned to accepted
- Version set to trunk
r15513 Fixed problem by temporarily using a List<object> to represent an unknown column-type until the type is known.
comment:5 Changed 7 years ago by pfleck
- Milestone changed from HeuristicLab 3.3.16 to HeuristicLab 3.3.15
comment:6 Changed 7 years ago by pfleck
- Status changed from accepted to readytorelease
comment:7 Changed 7 years ago by pfleck
comment:8 Changed 7 years ago by pfleck
- Resolution set to done
- Status changed from readytorelease to closed
Added Patch that treats columns of unknown types as List<object> and performs conversion to a specific list-type when the first non-missing value occurs for that column.