Opened 3 months ago

Last modified 40 hours ago

#3040 accepted enhancement

Vector-based GP

Reported by: pfleck Owned by: pfleck
Priority: medium Milestone:
Component: Problems.DataAnalysis.Symbolic Version: branch
Keywords: Cc:

Description

This ticket will track the overall development of implementing Vector-based GP for Time-Series Regression and Classification.

The main idea is supporting vectors as a new "datatype" in symbolic expression trees along regular numerical values. Additionally, new operators will be developed to work with those vectors to combine them with the existing numerical values.

Because developing the required features will likely involve implementing several components simultaneously, along with some changes within the core DataAnalysis plugins, there will be a main branch in which development will take place, with some trunk-reintegration branches to get completed features back into the trunk.

Change History (16)

comment:1 Changed 3 months ago by pfleck

  • Status changed from new to accepted

r17362 Branched trunk

comment:2 Changed 3 months ago by pfleck

r17364

  • Added double vectors for Dataset. Extended the type-checks for DataAnalysisProblemData.
  • Added a small benchmark instance with data containing vectors. Adapted the ArtificialRegressionDataDescriptor to be able to specify non-double values.

Additional thoughts:

  • Consider ModifiableDataset and DataPreprocessing.
  • Consider adding generic vector capabilities to IDataset that only allows double, string, DateTime.
  • Consider changing IList within the Dataset to a covariant alternative (non-generic IReadOnlyList does not exist, however). Currently the type must be exactly IReadOnlyList<double>, otherwise the invariant IList<T> is not a subtype of IList<IList<double>> for instance.
  • Each DataAnalysis algorithm should check on it's own, whether the types of the allowed input variables is compatible. For instance, the LR would only allow double-values, whereas SymReg also supports string-variables (as factor variables) and double-vector-variables.
Last edited 3 months ago by pfleck (previous) (diff)

comment:3 Changed 3 months ago by pfleck

r17365 Added explicit vector types to avoid type-missmatches when representing vectors as IList<T>, List<T> or IReadOnlyList<T>.

Additional toughts:

  • The IDataset interface (and its implementation) now contains a lot of methods due to all the different available types (double, string, DateTime and also vector-versions). In the future, this should be unified.
  • Whether the types of the input variables are allowed should be decided by the algorithms, rather than the ProblemData.

comment:4 Changed 3 months ago by pfleck

r17369 Added Vector symbols to TypeCoherentExpressionGrammar & fixes.

comment:5 Changed 6 weeks ago by pfleck

r17400 Added Azzali benchmarks

comment:6 Changed 6 weeks ago by pfleck

r17401 Added parser for new benchmark data but did not commit the data yet (too large)

comment:7 Changed 6 weeks ago by pfleck

r17403 Added fix for non-numeric class labels

comment:8 Changed 4 weeks ago by pfleck

r17414 Started adding UCI time series regression benchmarks. Adapted parser (extracted format options & added parsing for double vectors).

comment:9 Changed 4 weeks ago by pfleck

r17415 Added additional UCI instances for time series regression

comment:10 Changed 4 weeks ago by pfleck

r17416 enabled variable impacts for vectorial data (if vectors have the same length)

comment:11 Changed 3 weeks ago by pfleck

r17418

  • (partially) enabled data preprocessing for vectorial data
  • use flat zip-files for large benchmarks instead of embedded resources (faster build times)
  • added multiple variants of vector benchmark I (vector lenght constraints)

comment:12 Changed 3 weeks ago by pfleck

r17419 added missing source file

comment:13 Changed 4 days ago by pfleck

r17447 Added TransportPlugin for MathNet.Numerics.

comment:14 Changed 3 days ago by pfleck

r17448 Replaced own Vector with MathNet.Numerics Vector.

  • Used types are not yet storable.
  • I do not like the using DoubleVector = MathNet.Numerics.LinearAlgebra.Vector<double>; directive. Maybe Ill switch to using MathNet.Numerics.LinearAlgebra.Single; and only use Vector as type.

comment:15 Changed 2 days ago by pfleck

r17449 Added Transformers for Vectors. Added specialiced Transformers for double Dense/SparseVectorStorage and a generic mapper for the remaining (serializable) types.

comment:16 Changed 40 hours ago by pfleck

r17452 Improved Persistence for Vectors (removed the generic transformer and used the existing array transformer instead).

Note: See TracTickets for help on using tickets.