Uncovering Representative Groups in Multidimensional Projections

Abstract

Multidimensional projection-based visualization methods typically rely on clustering and attribute selection mechanisms to enable visual analysis of multidimensional data. Clustering is often employed to group similar instances according to their distance in the visual space. However, considering only distances in the visual space may be misleading due to projection errors as well as the lack of guarantees to ensure that distinct clusters contain instances with different content. Identifying clusters made up of a few elements is also an issue for most clustering methods. In this work we propose a novel multidimensional projection-based visualization technique that relies on representative instances to define clusters in the visual space. Representative instances are selected by a deterministic sampling scheme derived from matrix decomposition, which is sensitive to the variability of data while still been able to handle classes with a small number of instances. Moreover, the sampling mechanism can easily be adapted to select relevant attributes from each cluster. Therefore, our methodology unifies sampling, clustering, and feature selection in a simple framework. A comprehensive set of experiments validate our methodology, showing it outperforms most existing sampling and feature selection techniques. A case study shows the effectiveness of the proposed methodology as a visual data analysis tool.