Free cookie consent management tool by TermsFeed Policy Generator

Opened 8 years ago

Closed 7 years ago

#2661 closed defect (done)

Remove bugs in TableFileParser

Reported by: mkommend Owned by: gkronber
Priority: medium Milestone: HeuristicLab 3.3.15
Component: Problems.Instances Version: 3.3.14
Keywords: Cc:

Description (last modified by mkommend)

During modeling and importing CSV files in HeuristicLab I discovered three bugs in the current TableFileParser:

  • Converting the data types of columns could change the values (test_csv_importer_conversion). The first missing values are replaced with double.NaN while the last ones with string.Empty.
  • Errors in files after the look a head limit (per default 500 lines) are not reported, the progress dialog hangs and occurring exceptions are swallowed.
    • test_csv_importer_conversion2: column y contains a character after line 500 -> Exception when setting y as the target variable.
    • test_csv_importer_columns: row 500 has one additional value -> Exception while importing

I am unsure whether these bugs were introduced with #2071 or if they have already been present.

Attachments (3)

csv_importer_test_columns.csv (3.5 KB) - added by mkommend 8 years ago.
csv_importer_test_conversion.csv (117 bytes) - added by mkommend 8 years ago.
csv_importer_test_conversion2.csv (3.5 KB) - added by mkommend 8 years ago.

Download all attachments as: .zip

Change History (18)

Changed 8 years ago by mkommend

Changed 8 years ago by mkommend

Changed 8 years ago by mkommend

comment:1 Changed 8 years ago by mkommend

  • Description modified (diff)

comment:2 Changed 8 years ago by gkronber

r14284: added unit tests to reproduce the described bugs

comment:3 Changed 8 years ago by gkronber

r14285:14286:

  • make sure the progressbar is removed when the TableFileParser throws an exception
  • TableFileParser throws IOException instead of DataFormatException
Last edited 8 years ago by gkronber (previous) (diff)

comment:4 Changed 8 years ago by gkronber

r14288: fixed build fail because of deletion of DataFormatException class

comment:5 Changed 8 years ago by gkronber

r14296: implemented fixes for several problems in the TableFileParser.

We now also store the original string representation of all tokens and use those when we detect that a column cannot be read as DateTime / double.

Unfortunately, more memory is required for this solution. However, I found no better way because we cannot restart reading the input stream. Converting the parsed values back to a string representation involves so many edge cases that it would be hard to guarantee that we always produce an output which matches the original input string.

comment:6 Changed 8 years ago by gkronber

  • Status changed from new to accepted

comment:7 Changed 8 years ago by gkronber

r14298: catch and handle all Exceptions because otherwise they are swallowed by the TPL

comment:8 follow-up: Changed 8 years ago by gkronber

The current implementation is problematic because exceptions are handled directly by showing a message box. However, if the problem is loaded programmatically, e.g. from the 'Generate Experiment'-dialog this completely blocks the UI.

This must be refactored completely.

Last edited 8 years ago by gkronber (previous) (diff)

comment:9 follow-up: Changed 8 years ago by mkommend

During my work with new datasets I discovered an additional bug in the TableFileParser. DateTime values contained in a CSV file can only be imported as string values. r14408 fixes the bug that occurred due to wrong logical junction in an if clause (and instead of or) that was introduced with r14298. A result of that bug was that no DateTime columns containing a date in the first row, which day is the first, or a date in January or any date in 0001, could be parsed correctly.

r14408: Fixed a bug in the TableFileparser that omits the correct detection of DateTime columns.

Last edited 8 years ago by mkommend (previous) (diff)

comment:10 in reply to: ↑ 9 Changed 8 years ago by gkronber

Replying to mkommend:

r14408: Fixed a bug in the TableFileparser that omits the correct detection of DateTime columns.

Reviewed fix. Thanks for correcting this.

comment:11 in reply to: ↑ 8 Changed 8 years ago by gkronber

Replying to gkronber:

The current implementation is problematic because exceptions are handled directly by showing a message box. However, if the problem is loaded programmatically, e.g. from the 'Generate Experiment'-dialog this completely blocks the UI.

This must be refactored completely.

Cannot reproduce this. The 'Generate Experiment'-dialog handles Exceptions well by showing a message box. It handles exceptions when retrieving the data descriptors as well as when loading the data correctly.

The problem instance providers which are used by the 'Generate Experiment'-dialog never catch exceptions or show a message box.

comment:12 Changed 8 years ago by gkronber

r15170: merged r14284:14286, r14288, r14296, r14298, r14408 from trunk to stable

comment:13 Changed 8 years ago by gkronber

  • Status changed from accepted to readytorelease

comment:14 Changed 8 years ago by gkronber

all changesets merged to stable

comment:15 Changed 7 years ago by gkronber

  • Resolution set to done
  • Status changed from readytorelease to closed
Note: See TracTickets for help on using tickets.