id summary reporter owner description type status priority milestone component version resolution keywords cc
2906 Variable-Transformations for Data Analysis pfleck pfleck "The current transformation feature (implemented during the first version of the data preprocessing) is neither practical nor functioning satisfactorily.
Originally, the transformation feature was intended to support data analysis by being able to specify transformations to the data to make the training process easier for the learning algorithms.
Possible usage scenarios are:
- Scale variables to a given range (e.g. 0 - 1)
- Z-Normalize a variable
- log-Transform a variable
After training with transformed variables, an intermediate step is required, that performs the data transformation on the original values before feeding them to the actual model.
This creates two options for calculating the model accuracy (R², MSE, ...), depending on whether the calculation is based on
- the transformed variables or
- the original variables.
While the first describes the model-accuracy in terms of the training algorithm, the later describes how the model actually performs in real use.
Currently, we are not sure which option is better; therefore, we want to support both options.
==== Additional thoughts ====
- Performing the intermediate step of transforming the original variables before feeding them to the actual model could be done with a ""Transformation-Model"" that wraps the original model.
- From the users' perspective, the transformations could be done ""explicitly"" or ""hidden"", i.e. actually showing the transformed variables in the Dataset and displaying them as additional input-or as target variable, or showing the original Dataset and performing the transformation hidden from the user. Currently, we want to make transformations explicitly visible to the user.
- Each transformation must also specify an inverse transformation that has to be applied in case a transformation is performed on the target variable. For instance, if the target variable is log-transformed, the intermediate model must use the exponential function to transform the target back to its original value range.
- For symbolic regression, the intermediate model can be also applied by directly changing the model tree." feature request accepted high HeuristicLab 3.3.17 Problems.DataAnalysis branch