Master TAL - MSc. NLP
Course Unit
Written Corpora
UE
702
EC
1
Hours
30h
Course Description
This course explores methods and techniques used in NLP in building and using written corpora. Concepts and notions are introduced such as corpus linguistics and the criteria necessarily taken into account when collecting a corpus (size, variety, etc.) as well as the forms the corpora make take (character encoding, XML, …). Other notions developed include corpus collection from the Web, the use of different formats of documents, and normalisation of badly collected or heterogeneous data.
Learning Outcome
- Ability to conceive of the content of a corpus consisting of written documents
- Capacity to normalise the data of a corpus
- Ability to identify principles and examples of annotations of a copus
Prerequisites
-
The courses for the first semester of the master do not have prerequisites other than those defined for the specialisation
Targeted Skills
- Capacity to collect, structure, and represent data (sound, text, images,… )
- Combine and utilise interdisciplinary skills and know-how in the aims of creating innovative solutions
More Informations
Bibliography
- To be completed
Course URL – Arche
- To be completed
Link with other courses
- 702-EC2, 803 and 902-EC2
Evaluation procedures
Number of Tests
- 2
Nature of the tests
- labs
- final exam
Group work
- N/A
Combine with other specialization
- No