Evolving GP classifiers for streaming data tasks with concept change and label budgets: A benchmarking study


Ali Vahdat, Jillian Morgan, Andrew R. McIntyre, Malcolm I. Heywood and Nur Zincir-Heywood

Author Addresses: 

Faculty of Computer Science
Dalhousie University
6050 University Ave.
PO Box 15000
Halifax, Nova Scotia, Canada
B3H 4R2


Streaming data classification requires that several additional challenges are addressed that are not typically encountered in offline supervised learning formulations. Specifically, access to data at any training generation is limited to a small subset of the data, and the data itself is potentially generated by a non-stationary process. Moreover, there is a cost to requesting labels, thus a label budget is enforced. Finally, an anytime classification require- ment implies that it must be possible to identify a ‘champion’ classifier for predicting labels as the stream progresses. In this work, we propose a general framework for deploying ge- netic programming (GP) to streaming data classification under these constraints. The frame- work consists of a sampling policy and an archiving policy that enforce criteria for selecting data to appear in a data subset. Only the exemplars of the data subset are labeled, and it is the content of the data subset that training epochs are performed against. Specific recom- mendations include support for GP task decomposition / modularity and making additional training epochs per data subset. Both recommendations make significant improvements to the baseline performance of GP under streaming data with label budgets. Benchmarking issues addressed include the identification of datasets and performance measures.

Tech Report Number: 
Report Date: 
April 21, 2016
PDF icon CS-2016-02.pdf708.36 KB