Dr Matthew Burfitt discussing Mapper-type Algorithms

21 August 2019

University of Southampton speaker, Dr Matthew Burfitt, explains that this research investigates the stability of the Topological Data Analysis tool of Mapper in clustering various types of data into a limited number of clustered subsets. Mapper, works first by choosing a covering (with some overlapping) of the underlying metric space Rn of the data, and clustering the data inside each member of the covering, using a clustering algorithm. Then a graph is created whose nodes are the set of all clusters and edges are drawn between clusters that in adjacent covering sets as long as they share some data. The output graph may or may not be planar but Mapper simply map the graph into R3 with no crossing, i.e.  Mapper provides a low dimensional visualisation of data of any dimension in terms of the topology of the clusters centroids, but the Mapper output varies a lot with changes in the input cluster parameters. Many approaches to selecting a covering of the underlying metric space Rn. The classical approach uses filter functions: , and construct the pre-images of a set of open intervals with overlaps.

Parameters influencing the performance of mapper, relate to the choice of covering and the clustering algorithm(s).  The covering parameters in the classical case include the filter, the number of intervals and the width of overlap. Trial and error is the only way of selecting parameters with dramatically different results. The main challenge is the Kleinberg impossibility theorem: “There is no clustering function that satisfies Scale-Invariance, Richness, and Consistency”.

In order to define clustering instability, several cluster quality measures were reviewed.  Applying any quality function to alternative clustering helps remove the need of a unique existing minimal clustering function. Working around the permutations of data points would require minimising the matching distance within the clustering map.  This was illustrated by using the Voronoi diagram.

Clustering instability was then generalised into a Mapper setting. In particular, a numerical measure of the stability of Mapper was proposed in terms of the clustering sensitivity to these changes. This allows to obtain high values of Mapper instability and experimentally demonstrate how it can be applied to determine good Mapper outputs over parameters variation.

The speaker illustrated these concepts with several examples. The adapted approach tackles the instability of the clustering irrespective of the clustering algorithm, and requires very few of the specifics in the data, making it applicable to any Mapper-type tool. The speaker points out that this approach performed well on datasets in breast cancer and diabetes.

The presentation simulated a serious discussion on the applicability of this topic to many of the research data analysis at Buckingham, and several interesting questions were raised that help better our understanding of the main challenges in the field of data science.

Guest speakers presenting at the School of Computing give postgraduate and undergraduate students an insight into their research, showing them range of research projects which have been undertaken. This is an opportunity to broaden their understanding of computing and identify areas of interest for further study. All are encouraged to attend.

Check out the upcoming School of Computing events and seminars.

Find out more about our courses in computing.