AS5001 Advanced Data Analysis
Academic year
2024 to 2025 Semester 1
Curricular information may be subject to change
Further information on which modules are specific to your programme.
Key module information
SCOTCAT credits
15
SCQF level
SCQF level 11
Availability restrictions
This module is intended for students in the final year of an MPhys or MSci programme involving the School, students on MSc Astrophysics, and students on EngD Photonics.
Module Staff
TBC
Module description
This module develops an understanding of basic concepts and offers practical experience with the techniques of quantitative data analysis. Beginning with fundamental concepts of probability theory and random variables, practical techniques are developed for using quantitative observational data to answer questions and test hypotheses about models of the physical world. The methods are illustrated by applications to the analysis of time series, imaging, spectroscopy, and tomography datasets. Students develop their computer programming skills, acquire a data analysis toolkit, and gain practical experience by analyzing real datasets.
Relationship to other modules
Pre-requisites
FAMILIARITY WITH SCIENTIFIC PROGRAMMING LANGUAGE ESSENTIAL, FOR EXAMPLE THROUGH AS3013 OR PH3080. ENTRY TO AN MPHYS PROGRAMME IN THE SCHOOL OR MSC ASTROPHYSICS.
Assessment pattern
Coursework = 100%
Re-assessment
No Re-assessment available - laboratory based
Learning and teaching methods and delivery
Weekly contact
2 or 3 lectures or tutorials and some supervised computer lab sessions
Scheduled learning hours
30
Guided independent study hours
120
Additional information from school
AS5001 - Advanced Data Analysis
Overview
Astronomers and other physical scientists fit models to quantitative observational or experimental data in order to answer questions about the physical world. Data are always affected by measurement errors, leaving uncertainty in the answers to questions posed. Probability theory provides a precise language for discussing and expressing those uncertainties. Statistical data analysis provides practical tools for posing questions and teasing answers from the data. Analysis of real datasets is the best way to build expertise in quantitative data analysis.
Aims & Objectives
To develop an understanding of basic concepts and offer practical experience with the techniques of quantitative data analysis.
Learning Outcomes
By the end of the module, students should be comfortable with the concepts of probability theory and statistics, familiar with techniques for quantitative data analysis, and confident in their ability to tackle open-ended data analysis problems in physics & astronomy or wherever they may arise in their future work.
Synopsis
In this module, students will develop the creative problem solving skills and workflows that will help them confidently approach their open-ended research and analysis problems. For example, students will learn:
- how to make progress in the face of uncertainty and acquire domain knowledge from technical sources (textbooks, review articles, research articles).
- how to build an analysis workflow with research-grade toolchains (e.g., Git, Python, Astropy, Numpy, code profiling) and how to deal with the 'sharp edges' of realistic data sets (missing data, miscalibration, underestimated data uncertainties).
- the principles of data visualization and how to effectively convey technical results with impactful figures.
- probability theory, correlation, generative forward models, and Bayesian inference workflows.
- optimization/inference techniques like auto-differentiation, stochastic gradient descent, and Markov Chain Monte Carlo.
- non-parametric modelling techniques like Gaussian processes.
- machine learning techniques for dimensionality reduction.
- neural networks and their scientific applications.
- how to directly and plainly convey analytical results through good technical writing.
Additional information
This module is continuously assessed via three to four projects spread equally throughout the semester. Each project has the student engage with the research literature and apply statistical concepts to research-grade datasets through a mixture of analytic and numerical work. This is 15 credit module is expected to take 150 hours of study for the average student. There is no final exam, so students can expect to commit about 14 hours per week over weeks 1 to 11, including lectures and independent work on the assignments. Students are invited to use the programming tools they deem to be most efficient for them in working on the assignments. However, because much of the modern data analysis and machine learning ecosystem is built in Python, students will likely find it most efficient to use that for most if not all assignments.
Recommended Books
Please view University online record: