Opened 12 years ago
Closed 11 years ago
#1941 closed feature request (done)
Additional classification problem instances
Reported by: | sforsten | Owned by: | abeham |
---|---|---|---|
Priority: | medium | Milestone: | HeuristicLab 3.3.8 |
Component: | Problems.Instances | Version: | 3.3.8 |
Keywords: | Cc: |
Description
Adding additional classification problems to problem instances from http://archive.ics.uci.edu/ml/datasets.html
Change History (24)
comment:1 Changed 12 years ago by gkronber
- Summary changed from Adding addtitional classification problems to Addtitional classification problems
- Type changed from enhancement to feature request
comment:2 Changed 12 years ago by sforsten
- Status changed from new to accepted
comment:3 Changed 12 years ago by sforsten
comment:4 Changed 12 years ago by sforsten
r8596: adjusted unit tests
comment:5 Changed 12 years ago by mkommend
- Milestone changed from HeuristicLab 3.3.x Backlog to HeuristicLab 3.3.8
comment:6 Changed 12 years ago by abeham
I just received this error when building HL as Release and x86 using Build.cmd:
"c:\Windows\Microsoft.NET\Framework64\v4.0.30319\Microsoft.Common.targets(483,9): error : The OutputPath property is not set for project 'HeuristicLab.Problems.Instances.DataAnalysis.Views-3.3.csproj'. Please check to make sure that you have specified a valid combination of Configuration and Platform for this project. Configuration='Release' Platform='x86'. You may be seeing this message because you are trying to build a project without a solution file, and have specified a non-default Configuration or Platform that doesn't exist for this project. [D:\HL3\trunk\sources\HeuristicLab.Problems.Instances.DataAnalysis.Views\3.3\HeuristicLab.Problems.Instances.DataAnalysis.Views-3.3.csproj]"
comment:7 Changed 12 years ago by abeham
r8781: added missing platforms in Problems.Instances.DataAnalysis.Views
comment:8 Changed 12 years ago by sforsten
- Owner changed from sforsten to abeham
- Status changed from accepted to reviewing
r8841: added Iris and Thyroid problems from UCI
comment:9 Changed 12 years ago by abeham
- Owner changed from abeham to sforsten
- Status changed from reviewing to assigned
I see a problem with Thyroid in that there are many versions and different databases. I think to avoid confusion, we should distinguish it by name. I would suggest something like "Thyroid, S. Aeberhard, 1992" (name, donor, year received), as this seems to be the version that you have added. Should be done for the other UCI datasets also.
comment:10 Changed 12 years ago by sforsten
- Owner changed from sforsten to abeham
- Status changed from assigned to reviewing
r8889: IUCIDataDescriptor to add more information about the datasets to the data, to distinguish between different version, as suggested by abeham
comment:11 follow-up: ↓ 21 Changed 12 years ago by abeham
I looked through some other available datasets and found two that could be interesting to add as benchmarking:
- Vertebral Column (the 3-class dataset) has 310 instances and 6 attributes
- Parkinson (remove the first attribute (it's a string) and move the class attribute to the end) - seems to be suited for k-NN (after a quick analysis with weka), has few instances and a skewed class distribution
Please shuffle both datasets randomly before adding them.
comment:12 Changed 12 years ago by abeham
- Owner changed from abeham to sforsten
- Status changed from reviewing to assigned
comment:13 Changed 12 years ago by sforsten
r8902: fixed CreateGpSymbolicClassificationSampleTest and therefore also RunGpSymbolicClassificationSampleTest
comment:14 Changed 12 years ago by sforsten
- Owner changed from sforsten to abeham
- Status changed from assigned to reviewing
- added two additional classification instances as suggested by abeham
- fixed a bug in UCIInstanceProvider
comment:15 Changed 12 years ago by abeham
- Status changed from reviewing to readytorelease
r8908: Abbreviated first names as in the other datasets
comment:16 Changed 12 years ago by gkronber
- Summary changed from Addtitional classification problems to Additional classification problem instances
comment:17 Changed 12 years ago by sforsten
- Owner changed from abeham to sforsten
- Status changed from readytorelease to assigned
comment:18 Changed 12 years ago by sforsten
- Status changed from assigned to accepted
comment:19 Changed 12 years ago by sforsten
- Owner changed from sforsten to mkommend
- Status changed from accepted to reviewing
- added wisconsin breast cancer problem instance
- corrected iris dataset
- changed classification data descriptors to be able to set training and test partition as well as input and target variables (in the same way as it is done in regression)
comment:20 Changed 12 years ago by mkommend
- Status changed from reviewing to readytorelease
comment:21 in reply to: ↑ 11 Changed 11 years ago by gkronber
- Owner changed from mkommend to gkronber
- Status changed from readytorelease to assigned
Replying to abeham:
I looked through some other available datasets and found two that could be interesting to add as benchmarking:
- Vertebral Column (the 3-class dataset) has 310 instances and 6 attributes
- Parkinson (remove the first attribute (it's a string) and move the class attribute to the end) - seems to be suited for k-NN (after a quick analysis with weka), has little instances and a skewed class distribution
Please shuffle both datasets randomly before adding them.
I just checked the 'Parkinsons' data set and strongly believe that instances should not be shuffled. The data set contains multiple measurements for each patient. So, if the data set is shuffled it should be guaranteed that all measurements from a patient are either in the training partition or in the test partition only.
comment:22 Changed 11 years ago by gkronber
- Owner changed from gkronber to abeham
- Status changed from assigned to reviewing
r9453: removed shuffling for the Parkinson's data set from UCI.
comment:23 Changed 11 years ago by abeham
- Status changed from reviewing to readytorelease
You're right, thx!
comment:24 Changed 11 years ago by swagner
- Resolution set to done
- Status changed from readytorelease to closed
- Version changed from 3.3.7 to 3.3.8
r8595: