Publication of the week: Ammar Al Abd Alazeez, Sabah Jassim and Hongbo Du

27 March 2017

Alazeez, A., S. Jassim & H. Du, “EINCKM: An enhanced prototype-based method for clustering evolving data streams in big data”, Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2017) (Porto, February 2017), 173-183. DOI: 10.5220/0006196901730183.

Data stream clustering is becoming an active research area in big data. It refers to grouping constantly arriving new data records in large chunks to enable dynamic analysis/updating of information patterns conveyed by the existing clusters, the outliers, and the newly arriving data chunk. Prototype-based algorithms for solving the problem have their promises for simplicity and efficiency. However, existing implementations have limitations in relation to quality of clusters, ability to discover outliers, and little consideration of possible new patterns in different chunks. In this paper, a new incremental algorithm called Enhanced Incremental K-Means (EINCKM) is developed. The algorithm is designed to detect new clusters in an incoming data chunk, merge new clusters and existing outliers to the currently existing clusters, and generate modified clusters and outliers ready for the next round. The algorithm applies a heuristic-based method to estimate the number of clusters (K), a radius-based technique to determine and merge overlapped clusters and a variance-based mechanism to discover the outliers. The algorithm was evaluated on synthetic and real-life datasets. The experimental results indicate improved clustering correctness with a comparable time complexity to existing methods dealing with the same kind of problems.

Read more in the Scite Press Digital Library.

Ammar Al Abd Alazeez is a research student in the Department of Applied Computing at Buckingham.  Sabah Jassim is Professor of Mathematics and Computation, and Hongbo Du is Senior Lecturer in Computing.