Course Plan

 

Class Topics Materials Code
Class 1 Lecturers: Charalambos (Haris) Themistocleous Room: T346 Må 16/1 13:15 - 15:00 Notes: See also the current New York article: The great AI awakening. Introduction to the class Introduction to Machine Learning Combinatorics Computational Statistics using Python (& R) Class 1 Presentation Class 1 Presentation (printer friendly version1) Charalambos Themistocleous (2017). Introduction to R. Part A. Language Fundamentals (manuscript)Introduction to Python: Python Programming LanguageScipy/Numpy Quickstart Tutorial (see also Class 5 notes) 1Please prefer the printer friendly version to save paper. The code of the examples in the presentation (frequency lists, concordances, etc.) can be accessed here for those interested to see how they were created.
Class 2 Lecturers: Charalambos (Haris) Themistocleous Room: T346 To 19/1 10:15 - 12:00   Probability Theory: Introduction Class 2 Presentation Class 2 (printer friendly version)
Class 3 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 Må 23/1 10:15 - 12:00 Notes: Using probabilities in everyday decision making and how do avoid biases:  Tversky, A., & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185(4157), 1124-1131. doi:10.1126/science.185.4157.1124  
Law of Total probability
Independent vs. Dependent Events
Conditional Probability
Bayesian Theorem
 
Class 3 Presentation Class 3 (printer friendly version)      
Class 4 Lecturers: Charalambos (Haris) Themistocleous Room: T346
26/1 10:15:00 AM - 12:00:00 PM
Discrete Variables Continuous Variables Distributions
Bernoulli Distribution
Binomial Distribution
Hypergeometric Distribution Random Variables
Class 4 Presentation Class 4 (printer friendly version) The code of the examples in the presentation (frequency lists, concordances, etc.) can be accessed here.  
Assignment: Task 1 
Class 5 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 Må 30/1 10:15:00 AM - 12:00:00 PM
Computer Exercise 1: Distributions and
Random number generation based on distribution
Class 5 Presentation Class 5 (printer friendly version) Code Data
Class 6 Lecturers: Charalambos (Haris) Themistocleous Room: T346
To 02/02/16 10:15 - 12:00
Continuous Variables Hypothesis Testing Statistical concepts Linear Models Linear Mixed effectsModels Class 6 Presentation Class 6 (printer friendly version)
Assignment: Task 2 / Data
Class 7 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 Må 06/02/16 10:15 - 12:00 Information Theory Entropy  Class 7 Presentation Class 7 (printer friendly version)
Class 8 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 To 09/02/16 10:15:00 AM - 12:00:00 PM Machine learning Classification Basic Concepts Class 8 Presentation Class 8 (printer friendly version) Machine Learning - videos by Trevor Hastie and  and Rob Tibshirani Scikit Learn: Machine Learning in Python Working With Text Data
Class 9 Lecturers: Mehdi GhanimifardRoom: Lab 4 Må 13/2 10:15:00 - 12:00 FIRST ASSIGNMENT LAB DEADLINE 23/02/17 Naive Bayes Hints and sample codes  ASSIGNMENT 1
Class 10 Lecturers: Mehdi Ghanimifard Room: Lab 4 To 16/2 10:15 AM - 12:00 PM FIRST ASSIGNMENT LAB DEADLINE 23/02/17 Naive Bayes
Class 11 Lecturers: Charalambos (Haris) Themistocleous Room: T346 Må 20/2 10:15 - 12:00
Machine Learning Approaches Linear Discriminant Analysis Functional Discriminant Analysis  
 Class 11 Presentation Class 11 (printer friendly version) Caret Package in R (used for demonstrating model comparison in class):  https://topepo.github.io/caret/index.html    
Class 12 ASSIGNMENT 2 (Evaluation) Lecturers: Mehdi Ghanimifard Room: Lab 4 To 23/2 10:15 - 12:00 SECOND ASSIGNMENT LABDeadline: 02/03/17
Class 13 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 Må 27/2 10:15 - 12:00 Decision trees CART C5.0 Evaluation  Class 13 Presentation Class 13 (printer friendly version)
Class 14 Lecturers: Charalambos (Haris) Themistocleous Room: T346 To 02/03/ 10:15:00 AM - 12:00:00 Markov Chains Hidden Markov Models Viderbi  Class 14 Presentation Class 14 (printer friendly version) HMM: Book chapter from Daniel Jurafsky & James H. Martin.  Speech and Language Processing.
Class 15 Lecturers: Mehdi Ghanimifard Room: Lab 4 Må 06/03 10:15:00 AM - 12:00:00 PM   THIRD ASSIGNMENT LAB

Implementation of a part-of-speech tagger with Viterbi Algorithm.

 
Link to old instructions, and extended material Tagged corpora (ask for password)
Class 16 Lecturers: Charalambos (Haris) Themistocleous Room: T346 To 09/03 10:15 - 12:00 Hidden Markov Models Training and Evaluating HMMs Class 16 Presentation Class 16 (printer friendly version) Viterbi Python Code
Class 17Lecturers: Mehdi Ghanimifard Room: Lab 4 Må 13/3 10:15 - 12:00 THIRD ASSIGNMENT LAB
Class 18 Lecturers: Charalambos (Haris) Themistocleous Room: Lab 4 To 16/3 10:15 - 12:00 On Unsupervised Machine Learning Learning (Chatrine) Neural Networks Deep Neural Networks Class 18 Presentation Class 18 (printer friendly version)
EXAMS 20/3 12.30-16-30 Room: T219.  

 

 

 

Course Literature

Course Books

  • Christopher Manning and Hinrich Schütze (1999) Foundations of Statistical Natural Language Processing, Cambridge, Massachusetts, USA. MIT Press. Also see the book's supplemental materials website at Stanford.
  • Joseph K. Blitzstein, Jessica Hwang (2014). Introduction to Probability. London: CRC Press. Taylor & Francis.
  • James Gareth, Witten Daniela, Hastie Trevor and Robert Tibshirani (). An Introduction to Statistical Learning. Springer. Available online by the authors here.  Slides and videos for Statistical Learning MOOC by Hastie and Tibshirani available separately here. Slides and video tutorials related to this book by Abass Al Sharif can be downloaded here.

Complementary Textbooks

  • Daniel Jurafsky and James Martin (2008) An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition. Prentice Hall.
  • Russell, Stuart J.; Norvig, Peter (2009), Artificial Intelligence: A Modern Approach (3rd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-604259-7.

Resources

Course Description

7.5 hecr, 2nd semester, 1st study period

The purpose of this course is to give an introduction to probabilistic modeling, statistical methods and their use within the field of language technology. The following topics will be covered in the course:

  • Probability theory
  • Information theory
  • Statistical theory (sampling, estimation, hypothesis testing)
  • Language modeling
  • Part-of-speech tagging
  • Syntactic parsing
  • Word sense disambiguation
  • Machine translation
  • Evaluation

Elective course offered by the programme for students taking the one-year degree: Degree of Master of Arts (60 credits) in Language Technology (Filosofie magisterexamen i språkteknologi).

Course Syllabus

The course syllabus in full as adopted by the head of department can be downloaded in pdf.

Course syllabus in English

Course syllabus in Swedish

Application

The course can be offered as a freestanding single subject course for students not on the MLT programme. Information on application deadlines and admissions in the university course catalogue: