Distributed Information Systems Laboratory LSIR

Language-Independent NLP Applications of Taxonomy Induction

Project Details

Language-Independent NLP Applications of Taxonomy Induction

Laboratory : LSIR Semester / Master Completed


The goal of this project is to design and implement applications which utilize lexical taxonomies to solve complex NLP tasks in a language-independent fashion.

A lexical taxonomy is a hierarchical organization of concepts, and has been shown to be useful in many natural language processing tasks such as question answering, information retrieval, textual entailment. Wikipedia, As the largest and most accurate collaboratively-built semi-structured knowledge resource, has served as a major stepping stone towards automated taxonomy induction for multiple languages.

In this project, we reuse our existing work, which produced the world's largest multilingual taxonomic resource over 280 languages [1]. More specifically, we wish to address these research questions:

  • How to use lexical taxonomies for the task of corpus summarization?
  • How to use lexical taxonomies for the task of faceted search in information retrieval?

The project requires programming skills in python. Relevant experience in machine learning is preferred.

If you have any question, just drop us an email, or come to our office:

  • Amit Gupta (BC128): amit.gupta@epfl.ch

References: [1] Gupta, A., Lebret, R., Harkous, H., & Aberer, K. (2017). 280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification. arXiv preprint arXiv:1704.07624.

Contact: Amit Gupta