Distributed Information Systems Laboratory LSIR

Multilingual Taxonomy Induction from Wikipedia

Project Details

Multilingual Taxonomy Induction from Wikipedia

Laboratory : LSIR Semester / Master Completed


The goal of this project is to induce lexical taxonomies from Wikipedia for over 250 languages.

A lexical taxonomy is a hierarchical organization of concepts, and has been shown to be useful in many natural language processing tasks such as question answering, information retrieval, textual entailment. Wikipedia, As the largest and most accurate collaboratively-built semi-structured knowledge resource, has served as a major stepping stone towards automated taxonomy induction for multiple languages.

In this project, we aim to build on our existing work, which produced the world's largest multilingual taxonomic resource over 280 languages [1]. More specifically, we wish to address two research questions:

  • Relations Extraction: How to generalize and improve the performance of existing relations extraction approaches for all languages?
  • Relations Classification: How to improve the accuracy of taxonomy induction by using deep learning methods for classification of relations?

The project requires programming skills in python. Relevant experience in machine learning is preferred.

If you have any question, just drop us an email, or come to our office:

  • Amit Gupta (BC128): amit.gupta@epfl.ch

References: [1] Gupta, A., Lebret, R., Harkous, H., & Aberer, K. (2017). 280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification. arXiv preprint arXiv:1704.07624.

Contact: Amit Gupta