# STU44003 – Data Analytics
**Source**: https://teaching.scss.tcd.ie/module/stu44003-data-analytics/
**Parent**: https://www.maths.tcd.ie/undergraduate/modules/minor-stats.php
| | |
| --- | --- |
| **Module Code** | STU44003 |
| **Module Name** | Data Analytics |
| **ECTS Weighting****[**[1]**](#_ftn1)** | 10 ECTS |
| **Semester taught** | Semester 1 & 2 |
| **Module Coordinator/s** | Profs. Alessio Benavoli (semester I) and Athanasios Georgiadis (semester II) |
## Module Learning Outcomes
On successful completion of this module, students will be able to:
LO1. Identify, compare and select appropriate analysis and modelling techniques for a range of applications.
LO2. Deploy and document appropriate set of self-selected analysis techniques in response to the defined problem areas.
LO3. Demonstrate utilization of the appropriate statistical packages (in either R or Python) to perform and effectively present and interpret the analysis results.
## Module Content
- Overview of the field
- Review of Probability Theory
- Introduction of Monte-Carlo Methods and Simulation
- Review of Hypothesis Testing
- Analysis of Categorical Data
- Concepts of the Information Theory, Entropy, Mutual Information, Conditional Entropy, and Information Gain
- Using CHAID in Classification Tree
- Using Gini Index in Classification Tree
- Detailed Discussion of Classification and Regression Tree
- Overfitting and the corresponding techniques to avoid overfitting (Cross Validation, Bagging, Boosting, Random Forest, …)
- Rule Fit Procedure, and Model Evaluation
- Handling Unbalance Dataset
- Concept of Similarity and Distance
- Distance Measures for Various Data Types
- Hierarchical Cluster Analysis
- Principal Component Analysis
- Concepts of Data Missingness and Its Mechanism
- Methods of Missing Data Imputation (MDI)
- Using package MICE in R for MDI
- Nonparametric methods
- Introduction to Bayesian Statistics
- Examples of applications of Bayesian Statistics (Gibbs Sampling, …)
## Teaching and learning Methods
Lectures and lab sessions.
## Assessment Details
| | | | | | |
| --- | --- | --- | --- | --- | --- |
| **Assessment Component** | **Brief Description** | **Learning Outcomes Addressed** | **% of total** | **Week set** | Week Due |
| Coursework | semester I | All | 20% | | |
| Coursework | semester II | All | 20% | | |
| Examination | in-person (2 hours) | All | 60% | | Exam session in Semester 2 |
## Reassessment Details
Examination (2 hours, 100%)
Contact Hours and Indicative Student Workload
| | |
| --- | --- |
| **Contact Hours (scheduled hours per student over full module), broken down by**: | **54 hours** |
| Lecture | 44 hours |
| Laboratory | 10 hours |
| | |
| | |
| **Independent study (outside scheduled contact hours), broken down by:** | **40 hours** |
| Preparation for classes and review of material (including preparation for examination, if applicable | 30 hours |
| Completion of assessments (including examination, if applicable) | 10 hours |
| **Total Hours** | **94 hours** |
## Recommended Reading List
- Peter Bruce and Andrew Bruce, **Practical Statistics for Data Scientists**, O’Reilly, 2017
- Xin\_She Yang, **Introduction to Algorithms for Data Mining and Machine Learning**, Academic Press, 2019
- Alan Agresti, **An Introduction to Categorical Data Analysis**, John Wiley and Sons, 2019
- Michael Greenarcre and Raul Primicerio, **Multivariate Analysis of Ecological Data**, Fundacion BBVA, 2013
- Max Kuhn and Kjell Johnson, **Applied Predictive Modeling**, Springer, 2013
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, **The Elements of Statistical Learning**, Springer, 2021
- Pratap Dangeti, **Statistics for Machine Learning**, Packt, 2017
- Gururajan Govindan, Shubhangi Hora, and Konstantin Palagachev, **The data Analysis Workshop**, Packt, 2020
- Stef van Buuren, **Flexible Imputation of Missing Data**, CRC Press, 2018
- William M. Bolstad, James M. Curran, **Introduction to Bayesian Statistics**, Wiley, 2017
## Module Pre-requisites
**Prerequisite modules:** This is a year 4 module.
**Other/alternative non-module prerequisites:** NA
## Module Co-requisites
None
## Module Website
[Blackboard](https://tcd.blackboard.com/webapps/login/)