AS5001 Advanced Data Analysis

Academic year

2024 to 2025 Semester 1

Key module information

SCOTCAT credits

15

The Scottish Credit Accumulation and Transfer (SCOTCAT) system allows credits gained in Scotland to be transferred between institutions. The number of credits associated with a module gives an indication of the amount of learning effort required by the learner. European Credit Transfer System (ECTS) credits are half the value of SCOTCAT credits.

SCQF level

SCQF level 11

The Scottish Credit and Qualifications Framework (SCQF) provides an indication of the complexity of award qualifications and associated learning and operates on an ascending numeric scale from Levels 1-12 with SCQF Level 10 equating to a Scottish undergraduate Honours degree.

Availability restrictions

This module is intended for students in the final year of an MPhys or MSci programme involving the School, students on MSc Astrophysics, and students on EngD Photonics.

Module Staff

TBC

This information is given as indicative. Staff involved in a module may change at short notice depending on availability and circumstances.

Module description

This module develops an understanding of basic concepts and offers practical experience with the techniques of quantitative data analysis. Beginning with fundamental concepts of probability theory and random variables, practical techniques are developed for using quantitative observational data to answer questions and test hypotheses about models of the physical world. The methods are illustrated by applications to the analysis of time series, imaging, spectroscopy, and tomography datasets. Students develop their computer programming skills, acquire a data analysis toolkit, and gain practical experience by analyzing real datasets.

Relationship to other modules

Pre-requisites

FAMILIARITY WITH SCIENTIFIC PROGRAMMING LANGUAGE ESSENTIAL, FOR EXAMPLE THROUGH AS3013 OR PH3080. ENTRY TO AN MPHYS PROGRAMME IN THE SCHOOL OR MSC ASTROPHYSICS.

Assessment pattern

Coursework = 100%

Re-assessment

No Re-assessment available - laboratory based

Learning and teaching methods and delivery

Weekly contact

2 or 3 lectures or tutorials and some supervised computer lab sessions

Scheduled learning hours

30

The number of compulsory student:staff contact hours over the period of the module.

Guided independent study hours

120

The number of hours that students are expected to invest in independent study over the period of the module.

Additional information from school

AS5001 - Advanced Data Analysis

Overview

Astronomers and other physical scientists fit models to quantitative observational or experimental data in order to answer questions about the physical world. Data are always affected by measurement errors, leaving uncertainty in the answers to questions posed. Probability theory provides a precise language for discussing and expressing those uncertainties. Statistical data analysis provides practical tools for posing questions and teasing answers from the data. Analysis of real datasets is the best way to build expertise in quantitative data analysis.

Aims & Objectives

To develop an understanding of basic concepts and offer practical experience with the techniques of quantitative data analysis.

Learning Outcomes

By the end of the module, students should be comfortable with the concepts of probability theory and statistics, familiar with techniques for quantitative data analysis, and confident in their ability to tackle open-ended data analysis problems in physics & astronomy or wherever they may arise in their future work.

Synopsis

In this module, students will develop the creative problem solving skills and workflows that will help them confidently approach their open-ended research and analysis problems. For example, students will learn:

  • how to make progress in the face of uncertainty and acquire domain knowledge from technical sources (textbooks, review articles, research articles).
  • how to build an analysis workflow with research-grade toolchains (e.g., Git, Python, Astropy, Numpy, code profiling) and how to deal with the 'sharp edges' of realistic data sets (missing data, miscalibration, underestimated data uncertainties).
  • the principles of data visualization and how to effectively convey technical results with impactful figures.
  • probability theory, correlation, generative forward models, and Bayesian inference workflows.
  • optimization/inference techniques like auto-differentiation, stochastic gradient descent, and Markov Chain Monte Carlo.
  • non-parametric modelling techniques like Gaussian processes.
  • machine learning techniques for dimensionality reduction.
  • neural networks and their scientific applications.
  • how to directly and plainly convey analytical results through good technical writing.

Additional information

This module is continuously assessed via three to four projects spread equally throughout the semester. Each project has the student engage with the research literature and apply statistical concepts to research-grade datasets through a mixture of analytic and numerical work. This is 15 credit module is expected to take 150 hours of study for the average student. There is no final exam, so students can expect to commit about 14 hours per week over weeks 1 to 11, including lectures and independent work on the assignments. Students are invited to use the programming tools they deem to be most efficient for them in working on the assignments. However, because much of the modern data analysis and machine learning ecosystem is built in Python, students will likely find it most efficient to use that for most if not all assignments.

Recommended Books

Please view University online record:

https://sta.rl.talis.com/index.html