Opened 8 years ago
Closed 7 years ago
#2661 closed defect (done)
Remove bugs in TableFileParser
Reported by: | mkommend | Owned by: | gkronber |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.15 |
Component: | Problems.Instances | Version: | 3.3.14 |
Keywords: | Cc: |
Description (last modified by mkommend)
During modeling and importing CSV files in HeuristicLab I discovered three bugs in the current TableFileParser:
- Converting the data types of columns could change the values (test_csv_importer_conversion). The first missing values are replaced with double.NaN while the last ones with string.Empty.
- Errors in files after the look a head limit (per default 500 lines) are not reported, the progress dialog hangs and occurring exceptions are swallowed.
- test_csv_importer_conversion2: column y contains a character after line 500 -> Exception when setting y as the target variable.
- test_csv_importer_columns: row 500 has one additional value -> Exception while importing
I am unsure whether these bugs were introduced with #2071 or if they have already been present.
Attachments (3)
Change History (18)
Changed 8 years ago by mkommend
Changed 8 years ago by mkommend
Changed 8 years ago by mkommend
comment:1 Changed 8 years ago by mkommend
- Description modified (diff)
comment:2 Changed 8 years ago by gkronber
comment:3 Changed 8 years ago by gkronber
- make sure the progressbar is removed when the TableFileParser throws an exception
- TableFileParser throws IOException instead of DataFormatException
comment:4 Changed 8 years ago by gkronber
r14288: fixed build fail because of deletion of DataFormatException class
comment:5 Changed 8 years ago by gkronber
r14296: implemented fixes for several problems in the TableFileParser.
We now also store the original string representation of all tokens and use those when we detect that a column cannot be read as DateTime / double.
Unfortunately, more memory is required for this solution. However, I found no better way because we cannot restart reading the input stream. Converting the parsed values back to a string representation involves so many edge cases that it would be hard to guarantee that we always produce an output which matches the original input string.
comment:6 Changed 8 years ago by gkronber
- Status changed from new to accepted
comment:7 Changed 8 years ago by gkronber
r14298: catch and handle all Exceptions because otherwise they are swallowed by the TPL
comment:8 follow-up: ↓ 11 Changed 8 years ago by gkronber
The current implementation is problematic because Exceptions are handled directly by showing a message box. However, if the problem is loaded programmatically, e.g. from the 'Generate Experiment'-dialog this completely blocks the UI.
This must be refactored completely.
comment:9 follow-up: ↓ 10 Changed 8 years ago by mkommend
During my work with new datasets I discovered an additional bug in the TableFileParser. DateTime values contained in a CSV file can only be imported as string values. r14408 fixes the bug that occurred due to wrong logical junction in an if clause (and instead of or) that was introduced with r14298. A result of that bug was that no DateTime columns containing a date in the first row, which day is the first, or a date in January or any date in 0001, could be parsed correctly.
r14408: Fixed a bug in the TableFileparser that omits the correct detection of DateTime columns.
comment:10 in reply to: ↑ 9 Changed 8 years ago by gkronber
comment:11 in reply to: ↑ 8 Changed 7 years ago by gkronber
Replying to gkronber:
The current implementation is problematic because exceptions are handled directly by showing a message box. However, if the problem is loaded programmatically, e.g. from the 'Generate Experiment'-dialog this completely blocks the UI.
This must be refactored completely.
Cannot reproduce this. The 'Generate Experiment'-dialog handles Exceptions well by showing a message box. It handles exceptions when retrieving the data descriptors as well as when loading the data correctly.
The problem instance providers which are used by the 'Generate Experiment'-dialog never catch exceptions or show a message box.
comment:12 Changed 7 years ago by gkronber
comment:13 Changed 7 years ago by gkronber
- Status changed from accepted to readytorelease
comment:14 Changed 7 years ago by gkronber
all changesets merged to stable
comment:15 Changed 7 years ago by gkronber
- Resolution set to done
- Status changed from readytorelease to closed
r14284: added unit tests to reproduce the described bugs