Opened 2 weeks ago

Last modified 11 days ago

#3040 accepted enhancement

Vector-based GP

Reported by: pfleck Owned by: pfleck
Priority: medium Milestone:
Component: Problems.DataAnalysis.Symbolic Version: branch
Keywords: Cc:

Description

This ticket will track the overall development of implementing Vector-based GP for Time-Series Regression and Classification.

The main idea is supporting vectors as a new "datatype" in symbolic expression trees along regular numerical values. Additionally, new operators will be developed to work with those vectors to combine them with the existing numerical values.

Because developing the required features will likely involve implementing several components simultaneously, along with some changes within the core DataAnalysis plugins, there will be a main branch in which development will take place, with some trunk-reintegration branches to get completed features back into the trunk.

Change History (4)

comment:1 Changed 2 weeks ago by pfleck

  • Status changed from new to accepted

r17362 Branched trunk

comment:2 Changed 2 weeks ago by pfleck

r17364

  • Added double vectors for Dataset. Extended the type-checks for DataAnalysisProblemData.
  • Added a small benchmark instance with data containing vectors. Adapted the ArtificialRegressionDataDescriptor to be able to specify non-double values.

Additional thoughts:

  • Consider ModifiableDataset and DataPreprocessing.
  • Consider adding generic vector capabilities to IDataset that only allows double, string, DateTime.
  • Consider changing IList within the Dataset to a covariant alternative (non-generic IReadOnlyList does not exist, however). Currently the type must be exactly IReadOnlyList<double>, otherwise the invariant IList<T> is not a subtype of IList<IList<double>> for instance.
  • Each DataAnalysis algorithm should check on it's own, whether the types of the allowed input variables is compatible. For instance, the LR would only allow double-values, whereas SymReg also supports string-variables (as factor variables) and double-vector-variables.
Last edited 2 weeks ago by pfleck (previous) (diff)

comment:3 Changed 2 weeks ago by pfleck

r17365 Added explicit vector types to avoid type-missmatches when representing vectors as IList<T>, List<T> or IReadOnlyList<T>.

Additional toughts:

  • The IDataset interface (and its implementation) now contains a lot of methods due to all the different available types (double, string, DateTime and also vector-versions). In the future, this should be unified.
  • Whether the types of the input variables are allowed should be decided by the algorithms, rather than the ProblemData.

comment:4 Changed 11 days ago by pfleck

r17369 Added Vector symbols to TypeCoherentExpressionGrammar & fixes.

Note: See TracTickets for help on using tickets.