Metadata
Title
STU44003 – Data Analytics
Category
courses
UUID
637c51c139ba4ef887465b95893e4f3c
Source URL
https://teaching.scss.tcd.ie/module/stu44003-data-analytics/
Parent URL
https://www.maths.tcd.ie/undergraduate/modules/minor-stats.php
Crawl Time
2026-03-16T07:02:05+00:00
Rendered Raw Markdown
# STU44003 – Data Analytics

**Source**: https://teaching.scss.tcd.ie/module/stu44003-data-analytics/
**Parent**: https://www.maths.tcd.ie/undergraduate/modules/minor-stats.php

|  |  |
| --- | --- |
| **Module Code** | STU44003 |
| **Module Name** | Data Analytics |
| **ECTS Weighting****[**[1]**](#_ftn1)** | 10 ECTS |
| **Semester taught** | Semester 1 & 2 |
| **Module Coordinator/s** | Profs. Alessio Benavoli (semester I) and Athanasios Georgiadis (semester II) |

## Module Learning Outcomes

On successful completion of this module, students will be able to:

LO1. Identify, compare and select appropriate analysis and modelling techniques for a range of applications.

LO2. Deploy and document appropriate set of self-selected analysis techniques in response to the defined problem areas.

LO3. Demonstrate utilization of the appropriate statistical packages (in either R or Python) to perform and effectively present and interpret the analysis results.

## Module Content

- Overview of the field
- Review of Probability Theory
- Introduction of Monte-Carlo Methods and Simulation
- Review of Hypothesis Testing
- Analysis of Categorical Data
- Concepts of the Information Theory, Entropy, Mutual Information, Conditional Entropy, and Information Gain
- Using CHAID in Classification Tree
- Using Gini Index in Classification Tree
- Detailed Discussion of Classification and Regression Tree
- Overfitting and the corresponding techniques to avoid overfitting (Cross Validation, Bagging, Boosting, Random Forest, …)
- Rule Fit Procedure, and Model Evaluation
- Handling Unbalance Dataset
- Concept of Similarity and Distance
- Distance Measures for Various Data Types
- Hierarchical Cluster Analysis
- Principal Component Analysis
- Concepts of Data Missingness and Its Mechanism
- Methods of Missing Data Imputation (MDI)
- Using package MICE in R for MDI
- Nonparametric methods
- Introduction to Bayesian Statistics
- Examples of applications of Bayesian Statistics (Gibbs Sampling, …)

## Teaching and learning Methods

Lectures and lab sessions.

## Assessment Details

|  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- |
| **Assessment Component** | **Brief Description** | **Learning Outcomes Addressed** | **% of total** | **Week set** | Week Due |
| Coursework | semester I | All | 20% |  |  |
| Coursework | semester II | All | 20% |  |  |
| Examination | in-person (2 hours) | All | 60% |  | Exam session in Semester 2 |

## Reassessment Details

Examination (2 hours, 100%)

Contact Hours and Indicative Student Workload

|  |  |
| --- | --- |
| **Contact Hours (scheduled hours per student over full module), broken down by**: | **54 hours** |
| Lecture | 44 hours |
| Laboratory | 10 hours |
|  |  |
|  |  |
| **Independent study (outside scheduled contact hours), broken down by:** | **40 hours** |
| Preparation for classes and review of material (including preparation for examination, if applicable | 30 hours |
| Completion of assessments (including examination, if applicable) | 10 hours |
| **Total Hours** | **94 hours** |

## Recommended Reading List

- Peter Bruce and Andrew Bruce, **Practical Statistics for Data Scientists**, O’Reilly, 2017
- Xin\_She Yang, **Introduction to Algorithms for Data Mining and Machine Learning**, Academic Press, 2019
- Alan Agresti, **An Introduction to Categorical Data Analysis**, John Wiley and Sons, 2019
- Michael Greenarcre and Raul Primicerio, **Multivariate Analysis of Ecological Data**, Fundacion BBVA, 2013
- Max Kuhn and Kjell Johnson, **Applied Predictive Modeling**, Springer, 2013
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, **The Elements of Statistical Learning**, Springer, 2021
- Pratap Dangeti, **Statistics for Machine Learning**, Packt, 2017
- Gururajan Govindan, Shubhangi Hora, and Konstantin Palagachev, **The data Analysis Workshop**, Packt, 2020
- Stef van Buuren, **Flexible Imputation of Missing Data**, CRC Press, 2018
- William M. Bolstad, James M. Curran, **Introduction to Bayesian Statistics**, Wiley, 2017

## Module Pre-requisites

**Prerequisite modules:** This is a year 4 module.

**Other/alternative non-module prerequisites:** NA

## Module Co-requisites

None

## Module Website

[Blackboard](https://tcd.blackboard.com/webapps/login/)