Open Multilingual Wordnet

This page provides access to wordnets in a variety of languages, all linked to the Princeton Wordnet of English (PWN). The goal is to make it easy to use wordnets in multiple languages. The individual wordnets have been made by many different projects and vary greatly in size and accuracy. This page has (i) extracted and normalized the data, (ii) linked to it Princeton WordNet 3.0 and (iii) put it in one place.

If you use these wordnets, please cite the original projects who created them (linked in Table 1), if you got value from this aggregation, please cite us (see below).

Documentation, News and Updates

Search

We have a simple search interface. It uses the SQL database developed by the Japanese Wordnet.

Available Wordnets
Wordnet Lang Synsets Words Senses Core Licence Data Citation
Albanet als 4,676 5,990 9,602 31% CC BY 3.0 als.zip cite:als
Arabic WordNet (AWN) arb 10,165 14,595 21,751 48% CC BY SA 3.0 arb.zip cite:arb
Chinese Wordnet (Taiwan) cmn 4,913 3,206 8,069 28% wordnet cmn.zip cite:cmn
DanNet dan 4,476 4,468 5,859 81% wordnet dan.zip cite:dan
Princeton WordNet eng 117,659 148,730 206,978 100% wordnet eng.zip cite:eng
Persian Wordnet fas 17,759 17,560 30,461 41% Free to use fas.zip cite:fas
FinnWordNet fin 116,763 129,839 189,227 100% CC BY 3.0 fin.zip cite:fin
WOLF (Wordnet Libre du Français) fra 59,091 55,373 102,671 92% CeCILL-C fra.zip cite:fra
Hebrew Wordnet heb 5,448 5,325 6,872 27% wordnet heb.zip cite:heb
MultiWordNet ita 34,728 40,343 61,558 83% CC BY 3.0 ita.zip cite:ita
Japanese Wordnet jpn 57,179 91,959 158,064 95% wordnet jpn.zip cite:jpn
Multilingual Central Repository cat 45,826 46,531 70,622 81% CC BY 3.0 cat.zip cite:cat
Multilingual Central Repository eus 29,413 26,240 48,934 71% CC BY-NC-SA 3.0 eus.zip cite:eus
Multilingual Central Repository glg 19,312 23,124 27,138 36% CC BY 3.0 glg.zip cite:glg
Multilingual Central Repository spa 38,512 36,681 57,764 76% CC BY 3.0 spa.zip cite:spa
Wordnet Bahasa ind 51,755 64,948 142,488 99% MIT ind.zip cite:ind
Wordnet Bahasa zsm 42,615 51,339 119,152 99% MIT zsm.zip cite:zsm
Norwegian Wordnet nno 3,671 3,387 4,762 66% wordnet nno.zip cite:nno
Norwegian Wordnet nob 4,455 4,186 5,586 81% wordnet nob.zip cite:nob
plWordNet pol 14,008 18,860 21,001 30% wordnet pol.zip cite:pol
OpenWN-PT por 41,810 52,220 68,285 79% CC by SA 3.0 por.zip cite:por
Thai Wordnet tha 73,350 82,504 95,517 81% wordnet tha.zip cite:tha

38 synsets shared from 117,661 (0%)

Language codes linked to Lewis, M. Paul (ed.), 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/

Data for all of the wordnets.

Core refers to the percentage of synsets covered from the semi-automatically compiled list of 5000 "core" word senses in Princeton WordNet (approximately the 5000 most frequently used word senses). The original list is here from http://wordnetcode.princeton.edu/standoff-files/core-wordnet.txt (Boyd-Graber et al., 2008). Our version (converted to wn30 synsets)

The wordnets are also linked to the Suggested Upper Merged Ontology (Sumo: Niles and Pease, 2001; Pease, 2011)

The fullest list of wordnets is the Global Wordnet Association's Wordnets in the World.

Mapping between wordnet versions was done using the mappings from TALP at UPC (Daudé et al. 2000).

Format

The wn-data-*.tabtab files are tab separated files of synset-lemma pairs.

  # name␉lang␉url␉license
offset-pos␉type␉lemma
offset-pos␉type␉lemma
...
name the name of the project
lang the iso 3 letter code for the name
url the url of the project
license a short name for the license
offset the Princeton WordNet 3.0 offset 8 digit offset
pos one of [a,v,n,r] (we treat 's' as 'a')
lemma the lemma (word separator normalized to ' ')

Example:

# Thai	tha	http://th.asianwordnet.org/	wordnet 
13567960-n	tha:lemma	กระบวนการทรานแอมมิแนชัน
00155298-n	tha:lemma	การปฏิเสธ
14369530-n	tha:lemma	ภาวะการหายใจเร็วของทารกแรกเกิด
10850469-n	tha:lemma	เบธัน
11268326-n	tha:lemma	เรินต์เกน

For this data to be really useful you need to combine it with the synset relations from the Princeton wordnet.

Known Problems

References

als Ervin Ruci (2008)
On the current state of Albanet and related applications, Technical Report, University of Vlora
all Francis Bond and Kyonghee Paik (2012)
A survey of wordnets and their licenses In Proceedings of the 6th Global WordNet Conference (GWC 2012). Matsue. 64–71
arb Black W., Elkateb S., Rodriguez H., Alkhalifa M., Vossen P., Pease A., Bertran M., Fellbaum C., (2006)
The Arabic WordNet Project, Proceedings of LREC 2006
cat, eus, glg, spa, Aitor Gonzalez-Agirre, Egoitz Laparra and German Rigau (2012)
Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base. In Proceedings of the 6th Global WordNet Conference (GWC 2012) Matsue, Japan.
core Boyd-Graber, J., Fellbaum, C., Osherson, D., and Schapire, R. (2006)
Adding dense, weighted connections to WordNet. In: Proceedings of the Third Global WordNet Meeting, Jeju Island, Korea, January 2006
eng Christiane Fellbaum. (ed.) (1998)
WordNet: An Electronic Lexical Database, MIT Press
fre Benoit Sagot and Darla Fišer (2008)
Building a free French wordnet from multilingual resources, E. L. R. A. (ELRA) (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco
heb Noam Ordan and Shuly Wintner (2007)
Hebrew WordNet: a test case of aligning lexical databases across languages. International Journal of Translation 19(1):39–58, 2007
ita Emanuele Pianta, Luisa Bentivogli and Christian Girardi. (2002)
MultiWordNet: Developing an Aligned Multilingual Database. In Proceedings of the First International Conference on Global WordNet, Mysore, India, January 21-25, 2002, pp. 293-302.
ind,zsm Nurril Hirfana Mohamed Noor, Suerya Sapuan and Francis Bond (2011)
Creating the open Wordnet Bahasa In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25) pages 258–267. Singapore
jpn Hitoshi Isahara, Francis Bond, Kiyotaka Uchimoto, Masao Utiyama and Kyoko Kanzaki (2008)
Development of Japanese WordNet. In LREC-2008, Marrakech.
fas Montazery, Mortaza and Heshaam Faili (2010)
Automatic Persian WordNet Construction the 23rd International conference on computational linguistics pp. 846–850
fin Lindén K., Carlson. L., (2010)
FinnWordNet — WordNet påfinska via översättning,LexicoNordica — Nordic Journal of Lexicography, 17:119–140
mapp Jordi Daudé, Lluís Padró and German Rigau (2000)
Mapping WordNets Using Structural Information. 38th Annual Meeting of the Association for Computational Linguistics (ACL'2000), Hong Kong
pol Maciej Piasecki, Stanisław Szpakowicz and Bartosz Broda. (2009)
A Wordnet from the Ground Up. Wroclaw: Oficyna Wydawnicza Politechniki Wroclawskiej, Poland.
por Valeria de Paiva and Alexandre Rademaker (2012)
Revisiting a Brazilian wordnet. In Proceedings of Global Wordnet Conference, Matsue. Global Wordnet Association. (also with Gerard de Melo's contribution)
sumo Adam Pease (2011)
Ontology: A Practical Guide. Articulate Software Press, Angwin, CA. ISBN 978-1-889455-10-5.
sumo Niles, I and Adam Pease (2001)
Toward a Standard Upper Ontology. In Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Chris Welty and Barry Smith, eds.
tha Thoongsup S., Charoenporn T., Robkop K., Sinthurahat T., Mokarat C., Sornlertlamvanich V., Isahara H. (2009)
Thai Wordnet Construction Proceedings of The 7th Workshop on Asian Language Resources (ALR7), Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and the 4th International Joint Conference on Natural Language Processing (IJCNLP) Suntec, Singapore
Francis Bond <bond@ieee.org>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303