At this year’s industrial challenge at the Genetic and Evolutionary Computation Conference GECCO 2014 (http://www.spotseven.de/gecco-challenge/gecco-challenge-2014/) the task was to forecast the emergence of ammonia in a river near Cologne on the basis of rainfall and several (chemical) water properties. On the one hand, this is a highly important issue as cattle dunging and wastewater build the basis for an increasing emergence of ammonia, which is highly toxic and a threat, particularly to fish and other aquatic creatures. On the other hand, this situation poses a challenging test case for modern time series prediction methods.
We decided to participate in this challenge and pursued the following approach, which is a mixture of the preprocessing and modeling strategies pursued in our most recent publications on medical data analysis and stock market data forecasting:
- First we preprocessed all given data and calculated short term as well as long term moving window averages for all variables.
- Then we used several regression methods available in HeuristicLab for training models for the formation of ammonia on the basis of the original variables as well as the moving window averages. We used the following modeling techniques: Linear regression, random forests, support vector machines, and genetic programming. All modeling methods were used 10 times with varying modeling parameters for both modeling scenarios. For each scenario we used 80% of the given training data for training and the remaining 20% as validation data.
- Finally, of all so trained models we selected those 5 models with the best fit on validation data. The eventual test values prediction were calculated as the sample-wise average of the selected models’ outputs.
An then at GECCO results of the competition were announced – we came in second!