As a result of recent advances in astronomical and digital technologies, astronomy is rapidly becoming a data-rich science. The much-increased data rates from radio surveys with the MeerKAT telescope, The Australian Square Kilometre Array Pathfinder (ASKAP), and eventually the Square Kilometre Array (SKA), require the adoption of machine-learning techniques to automate most tasks previously carried out manually by astronomers. One such task is classifying radio sources as star-formation- or accretion-dominated. Both of these processes can be traced via synchrotron emission at radio wavelengths.
However, a reliable automated classification of radio sources as star-formation-dominated sources is non-trivial and often requires extensive use of multi-wavelength data. Classification of star formation dominated or accretion-dominated sources from the radio continuum surveys is necessary before understanding the nature of these radio sources.
In this study, we implement and optimise five supervised machine learning techniques; Logistic Regression, Support Vector Machine, K-Nearest Neighbour, Random Forest and XGBoost, to classify radio sources detected in the MeerKAT International GHz Tiered Extragalactic Exploration (MIGHTEE) –COSMOS survey as star-formation-or accretion-dominated.
Sloan Digital Sky Survey telescope SDSS
Large scale structure in the northern equatorial slice of the SDSS main galaxy redshift sample. The slice is 2.5 degrees thick, and galaxies are color-coded by g-r color. [photo cred: sdss-legacy ]
This project is more of a build up on the regression repository. It mainly focuses on optimizing the hyper-parameters of the XGBoost regressor to best estimate the photometric redshifts Quasars and Star forming galaxies under study. We used 80% (about one million spectroscopically confirmed SDSS sources) of the dataset for training the algorithm and 20% for testing. We used sk-learn Randomised Search CV and r2_score, Median absolute deviation and both of them as the scoring metrics in different experiments (see the github f to find the best parameters for our testing data. The Median absolute deviation provides the best RMS and NMAD for this project.