Galaxy cluster classification with convolutional neural networks

During my time at ESA, I mentored a Master’s student, Matej Kosiba from Masaryk University, and we recently published a paper together on using convolutional neural networks to detect clusters of galaxies (https://arxiv.org/abs/2006.05998).

You see, the detection of galaxy clusters is extremely out-dated. Typically a wavelet filtering algorithm is applied on X-ray images to pull out candidate clusters, and then a team of “experts” will examine each candidate, often combined with access to the corresponding optical data on the sky, to verify if the object is actually a cluster. More often than not, the candidates will be contaminated with point sources (stars, AGN), nearby galaxies and artefacts within the data. So when you have perhaps 1000 candidate objects, with 2 experts looking at each candidate it still is doable. But when X-ray observatory e-ROSITA comes online ~100,000 the traditional methods of detection will be inefficient.

The research therefore turned to machine learning, which is more adapted to dealing with huge numbers of tasks quickly and in an objective manner. In particular, convolutional neural networks which were developed specifically to apply to image data. Convolutional neural networks (which I’ll leave the details for another post) learn the hierarchical structure of features within images of the same class, and are relatively insensitive to small offsets or changes in the images.

To train the network we require image data and the corresponding class labels. The image data are X-ray images stacked onto the Optical images. For the class labels, we take 2 approaches. The first approach used the power of citizen science to crowd classify large samples of the data by large numbers of volunteers. To do this the authors created a Zooniverse project attracting thousands of volunteers. The second approach took labels classified by 2 experts. We compared them using a spectroscopically confirmed cross calibration sample.

Screenshot 2020-06-12 at 09.19.12

This is the money plot, it’s called a ROC curve (receiver operating characteristic curve) and shows the true positive rate vs false positive rate for different thresholds of classification. The idea is that is the lines lie on the diagonal, then the networks are performing as well as chance. Ideally you want the lines to draw up to the top left hand corner. In this plot we see the various networks tried: CN-E (custom network trained on expert labels), MN-E (Mobilenet trained on expert labels), CN-Z (custom network trained on Zooniverse labels).

The main result of the paper is that CNNs could achieve over 90% accuracy on classifying clusters of galaxies, on multi-wavelength data (X-ray plus optical) and at a tiny fraction of the time it takes for an expert to do the same. Additionally we found that the X-ray data were better for classification of clusters, however this may be due to the limited depth of the optical data we had at hand.