2) Machine Learning In Multi-Wavelength Galaxy/Quasar Evolution: Photometric Redshift Estimation
The photometric redshifts estimation is currently the most powerful and efficient way to estimate the distances to the extragalactic sources. The exponential data avalanche continues and this will require low cost, fast and efficient data-driven methods to analyse and make predictions from the data. In this study, we present the supervised machine learning algorithms that are used to estimate the photometric redshifts of the galaxies and quasars that are found in a cross-matched Sloan Digital Sky Survey data release 16 (SDSS DR16) and WISE datasets. We adopt the K-Nearest Neighbour (KNN) and Random Forest (RF) regressors to estimate the photometric redshifts of 285685 galaxies and 124688 quasars by considering their photometric measurements.
The first figure on the left is the colour-colour diagram showing the distribution of the observed galaxies and quasars with the colour bar of true spectroscopic redshifts. The left plot indicates the observed galaxy count distribution on the 2D grid of r-i vs u-g, and the right plot represents the distribution of observed quasars on the same r-i vs u-g grid. The figure below on the left shows the normalised redshift estimation error, ∆z norm as a function of the spectroscopic redshift of galaxies. The last figure indicates the predicted redshifts as a function of the spectroscopic redshift by the K-Nearest Neighbour algorithm. The left plot represents the z phot vs z spec for galaxies and the right plot shows the z phot vs z spec for quasars.
I would like to thank my supervisor, Prof. Mattia Vaccari for his support throughout the project. I am also grateful for the effort that has been put forward by Chaka Mofokeng in helping with the code write up. I also appreciate the support from my colleague Yaaseen Jones, friends and family. The work uses the SDSS DR16 data which is the fourth data release of the fourth phase of Sloan Digital Sky Survey and AllWISE data products from the Wide-field Infrared Survey Explorer. Without the two surveys this work would have been impossible.
3) Hyperparameter optimization for XGBoost
This project is more of a build up on the regression repository. It mainly focuses on optimizing the hyper-parameters of the XGBoost regressor to best estimate the photometric redshifts Quasars and Star forming galaxies under study. We used 80% (about one million spectroscopically confirmed SDSS sources) of the dataset for training the algorithm and 20% for testing. We used sk-learn Randomised Search CV and r2_score, Median absolute deviation and both of them as the scoring metrics in different experiments (see the github f to find the best parameters for our testing data. The Median absolute deviation provides the best RMS and NMAD for this project.