LT2202, Statistical methods
Course Plan
Class | Topics | Materials | Code |
Class 1 Lecturers: Charalambos (Haris) Themistocleous Room: T346 Må 16/1 13:15 - 15:00 Notes: See also the current New York article: The great AI awakening. | Introduction to the class Introduction to Machine Learning Combinatorics Computational Statistics using Python (& R) | Class 1 Presentation Class 1 Presentation (printer friendly version1) Charalambos Themistocleous (2017). Introduction to R. Part A. Language Fundamentals (manuscript)Introduction to Python: Python Programming Language, Scipy/Numpy Quickstart Tutorial (see also Class 5 notes) 1Please prefer the printer friendly version to save paper. | The code of the examples in the presentation (frequency lists, concordances, etc.) can be accessed here for those interested to see how they were created. |
Class 2 Lecturers: Charalambos (Haris) Themistocleous Room: T346 To 19/1 10:15 - 12:00 | Probability Theory: Introduction | Class 2 Presentation Class 2 (printer friendly version) | |
Class 3 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 Må 23/1 10:15 - 12:00 Notes: Using probabilities in everyday decision making and how do avoid biases: Tversky, A., & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124-1131. doi:10.1126/science.185.4157.1124 |
Law of Total probability
Independent vs. Dependent Events
Conditional Probability
Bayesian Theorem
|
Class 3 Presentation Class 3 (printer friendly version) | |
Class 4
Lecturers: Charalambos (Haris) Themistocleous
Room: T346
26/1 10:15:00 AM - 12:00:00 PM |
Discrete Variables
Continuous Variables
Distributions
Bernoulli Distribution
Binomial Distribution
Hypergeometric Distribution
Random Variables
|
Class 4 Presentation Class 4 (printer friendly version) | The code of the examples in the presentation (frequency lists, concordances, etc.) can be accessed here. |
Assignment: Task 1 | |||
Class 5 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 Må 30/1 10:15:00 AM - 12:00:00 PM |
Computer Exercise 1: Distributions and
Random number generation based on distribution
|
Class 5 Presentation Class 5 (printer friendly version) | Code Data |
Class 6
Lecturers: Charalambos (Haris) Themistocleous
Room: T346
To 02/02/16 10:15 - 12:00 |
Continuous Variables Hypothesis Testing Statistical concepts Linear Models Linear Mixed effectsModels | Class 6 Presentation Class 6 (printer friendly version) | |
Assignment: Task 2 / Data | |||
Class 7 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 Må 06/02/16 10:15 - 12:00 | Information Theory Entropy | Class 7 Presentation Class 7 (printer friendly version) | |
Class 8 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 To 09/02/16 10:15:00 AM - 12:00:00 PM | Machine learning Classification Basic Concepts | Class 8 Presentation Class 8 (printer friendly version) | Machine Learning - videos by Trevor Hastie and and Rob Tibshirani Scikit Learn: Machine Learning in Python Working With Text Data |
Class 9 Lecturers: Mehdi GhanimifardRoom: Lab 4 Må 13/2 10:15:00 - 12:00 | FIRST ASSIGNMENT LAB DEADLINE 23/02/17 Naive Bayes | Hints and sample codes | ASSIGNMENT 1 |
Class 10 Lecturers: Mehdi Ghanimifard Room: Lab 4 To 16/2 10:15 AM - 12:00 PM | FIRST ASSIGNMENT LAB DEADLINE 23/02/17 Naive Bayes | ||
Class 11 Lecturers: Charalambos (Haris) Themistocleous Room: T346 Må 20/2 10:15 - 12:00 |
Machine Learning Approaches
Linear Discriminant Analysis
Functional Discriminant Analysis
|
Class 11 Presentation Class 11 (printer friendly version) | Caret Package in R (used for demonstrating model comparison in class): https://topepo.github.io/caret/index.html |
Class 12 ASSIGNMENT 2 (Evaluation) Lecturers: Mehdi Ghanimifard Room: Lab 4 To 23/2 10:15 - 12:00 | SECOND ASSIGNMENT LABDeadline: 02/03/17 | ||
Class 13 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 Må 27/2 10:15 - 12:00 | Decision trees CART C5.0 Evaluation | Class 13 Presentation Class 13 (printer friendly version) | |
Class 14 Lecturers: Charalambos (Haris) Themistocleous Room: T346 To 02/03/ 10:15:00 AM - 12:00:00 | Markov Chains Hidden Markov Models Viderbi | Class 14 Presentation Class 14 (printer friendly version) | HMM: Book chapter from Daniel Jurafsky & James H. Martin. Speech and Language Processing. |
Class 15 Lecturers: Mehdi Ghanimifard Room: Lab 4 Må 06/03 10:15:00 AM - 12:00:00 PM | THIRD ASSIGNMENT LAB
Implementation of a part-of-speech tagger with Viterbi Algorithm. |
Link to old instructions, and extended material Tagged corpora (ask for password) | |
Class 16 Lecturers: Charalambos (Haris) Themistocleous Room: T346 To 09/03 10:15 - 12:00 | Hidden Markov Models Training and Evaluating HMMs | Class 16 Presentation Class 16 (printer friendly version) | Viterbi Python Code |
Class 17Lecturers: Mehdi Ghanimifard Room: Lab 4 Må 13/3 10:15 - 12:00 | THIRD ASSIGNMENT LAB | ||
Class 18 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 To 16/3 10:15 - 12:00 | On Unsupervised Machine Learning Learning (Chatrine) Neural Networks Deep Neural Networks | Class 18 Presentation Class 18 (printer friendly version) | |
EXAMS 20/3 12.30-16-30 Room: T219. |
Course Literature
Course Books
- Christopher Manning and Hinrich Schütze (1999) Foundations of Statistical Natural Language Processing, Cambridge, Massachusetts, USA. MIT Press. Also see the book's supplemental materials website at Stanford.
- Joseph K. Blitzstein, Jessica Hwang (2014). Introduction to Probability. London: CRC Press. Taylor & Francis.
- James Gareth, Witten Daniela, Hastie Trevor and Robert Tibshirani (). An Introduction to Statistical Learning. Springer. Available online by the authors here. Slides and videos for Statistical Learning MOOC by Hastie and Tibshirani available separately here. Slides and video tutorials related to this book by Abass Al Sharif can be downloaded here.
Complementary Textbooks
- Daniel Jurafsky and James Martin (2008) An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition. Prentice Hall.
- Russell, Stuart J.; Norvig, Peter (2009), Artificial Intelligence: A Modern Approach (3rd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-604259-7.
Resources
- TheClassification And REgression Training (caret) Package R. http://topepo.github.io/caret/index.html
- The R Project for Statistical Computing
- Python Programming Language
- Scipy/Numpy Quickstart Tutorial
Course Description
7.5 hecr, 2nd semester, 1st study period
The purpose of this course is to give an introduction to probabilistic modeling, statistical methods and their use within the field of language technology. The following topics will be covered in the course:
- Probability theory
- Information theory
- Statistical theory (sampling, estimation, hypothesis testing)
- Language modeling
- Part-of-speech tagging
- Syntactic parsing
- Word sense disambiguation
- Machine translation
- Evaluation
Elective course offered by the programme for students taking the one-year degree: Degree of Master of Arts (60 credits) in Language Technology (Filosofie magisterexamen i språkteknologi).
Course Syllabus
The course syllabus in full as adopted by the head of department can be downloaded in pdf.
Application
The course can be offered as a freestanding single subject course for students not on the MLT programme. Information on application deadlines and admissions in the university course catalogue: