Clustering of the values of a response variable and simultaneous covariate selection using a stepwise algorithm.

  • Competence Center for Methodology and Statistics
September 16, 2016 By:
  • Collignon O
  • Monnez JM.

In supervised learning the number of values of a response variable can be very high. Grouping these values in a few clusters can be useful to perform accurate supervised classification analyses. On the other hand selecting relevant covariates is a crucial step to build robust and efficient prediction models. We propose in this paper an algorithm that simultaneously groups the values of a response variable into a limited number of clusters and selects stepwise the best covariates that discriminate this clustering. These objectives are achieved by alternate optimization of a user-defined model selection criterion. This process extends a former version of the algorithm to a more general framework. Moreover possible further developments are discussed in detail.

2016 Sep. Appl Math.7(15):1639-1648.
Other information