4.3 Transfer to an Unlabeled Task using kernel marginal predictors (Gilles Blanchard)

Share:

Listens: 0

StatLearn 2012 - Workshop on "Challenging problems in Statistical Learning"

Education


We consider a classification problem: the goal is to assign class labels to an unlabeled test data set, given several labeled training data sets drawn from different but similar distributions. In essence, the goal is to predict labels from (an estimate of) the marginal distribution (of the unlabeled data) by learning the trends present in related classification tasks that are already known. In this sense, this problem belongs to the category of so-called "transfer learning" in machine learning. The probabilistic model used is that the different training and test distributions are themselves i.i.d. realizations from a distribution on distributions. Conceptually, this setting can be related to traditional random effects models in statistics, although here the approach is nonparametric and distribution-free. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distribution-free, kernel-based approach to the problem. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on flow cytometry data are presented.