Master TAL - MSc. NLP
Course Unit
Data Science
UE
803
EC
Hours
60h
Course Description
This course unit introduces fundamental techniques for the extraction, storage, cleaning, visualisation and analysis of data. We give a practical introduction to the tools and software libraries which allows the processing of data. We combine theoretical sessions with programming exercises which allows students to put into practice the software and concepts taught during the course.
EC1 Storage and data extraction
This course aims to familiarize students with the techniques used to retrieve and analyse data from various sources (excel files, databases, web, tweets, etc.) and available in different formats (XML, HTML, OWL, CSV, Json, etc.). The course will present the most common data formats and explain how to retrieve data from the web, via APIs, from databases, and from knowledge bases.
EC2 Data analysis
This course presents different ways in which data can be treated, analysed, and visualized. We describe how to pre-process and summarize the content of data (distribution, variance, etc.), how to use this in order to make predictions (classification, regression, clustering) and how to visualize data (histogrammes, heat map, etc.).
Learning Outcome
- Identification of tools and libraries necessary for the collection of data
- Practical real-life application of tools and these libraries
- Construction of normalised data banks
- Analysis of data
- Pre-processing and summarisation of data
- Visualisation of properties of a collection of data, visualisation and analysis of selected data
Prerequisites
-
UE 701 and UE 703
Targeted Skills
- Know how to retrieve, structure, and represent data (sound, text, images,… )
- Exercise autonomy and initiative
More Informations
Bibliography
- To be completed
Course URL – Arche
- To be completed
Link with other courses
- to be completed
Evaluation procedures
Number of Tests
- to be completed
Nature of the tests
- to be completed
Group work
- to be completed
Combine with other specialization
- No