Open Multilingual Wordnet
This page provides access to wordnets in a variety of
languages, all linked to
the Princeton Wordnet of
English (PWN). The goal is to make it easy to use wordnets in multiple
languages. The individual wordnets have been made by many
different projects and vary greatly in size and accuracy. This
page has (i) extracted and normalized the data, (ii) linked to it
Princeton WordNet 3.0 and (iii) put it in one place.
If you use these wordnets, please cite the original projects
who created them (linked in Table 1), if you got value from this
aggregation, please cite us (see below).
Documentation, News and Updates
Search
We have a simple search
interface. It uses the SQL database developed by the
Japanese Wordnet.
Available Wordnets
| Wordnet |
Lang |
Synsets |
Words |
Senses |
Core |
Licence |
Data |
Citation |
| Albanet |
als |
4,676 |
5,990 |
9,602 |
31% |
CC BY 3.0 |
als.zip |
cite:als |
| Arabic WordNet (AWN) |
arb |
10,165 |
14,595 |
21,751 |
48% |
CC BY SA 3.0 |
arb.zip |
cite:arb |
| Chinese Wordnet (Taiwan) |
cmn |
4,913 |
3,206 |
8,069 |
28% |
wordnet |
cmn.zip |
cite:cmn |
| DanNet |
dan |
4,476 |
4,468 |
5,859 |
81% |
wordnet |
dan.zip |
cite:dan |
| Princeton WordNet |
eng |
117,659 |
148,730 |
206,978 |
100% |
wordnet |
eng.zip |
cite:eng |
| Persian Wordnet |
fas |
17,759 |
17,560 |
30,461 |
41% |
Free to use |
fas.zip |
cite:fas |
| FinnWordNet |
fin |
116,763 |
129,839 |
189,227 |
100% |
CC BY 3.0 |
fin.zip |
cite:fin |
| WOLF (Wordnet Libre du Français) |
fra |
59,091 |
55,373 |
102,671 |
92% |
CeCILL-C |
fra.zip |
cite:fra |
| Hebrew Wordnet |
heb |
5,448 |
5,325 |
6,872 |
27% |
wordnet |
heb.zip |
cite:heb |
| MultiWordNet |
ita |
34,728 |
40,343 |
61,558 |
83% |
CC BY 3.0 |
ita.zip |
cite:ita |
| Japanese Wordnet |
jpn |
57,179 |
91,959 |
158,064 |
95% |
wordnet |
jpn.zip |
cite:jpn |
| Multilingual Central Repository |
cat |
45,826 |
46,531 |
70,622 |
81% |
CC BY 3.0 |
cat.zip |
cite:cat |
| Multilingual Central Repository |
eus |
29,413 |
26,240 |
48,934 |
71% |
CC BY-NC-SA 3.0 |
eus.zip |
cite:eus |
| Multilingual Central Repository |
glg |
19,312 |
23,124 |
27,138 |
36% |
CC BY 3.0 |
glg.zip |
cite:glg |
| Multilingual Central Repository |
spa |
38,512 |
36,681 |
57,764 |
76% |
CC BY 3.0 |
spa.zip |
cite:spa |
| Wordnet Bahasa |
ind |
51,755 |
64,948 |
142,488 |
99% |
MIT |
ind.zip |
cite:ind |
| Wordnet Bahasa |
zsm |
42,615 |
51,339 |
119,152 |
99% |
MIT |
zsm.zip |
cite:zsm |
| Norwegian Wordnet |
nno |
3,671 |
3,387 |
4,762 |
66% |
wordnet |
nno.zip |
cite:nno |
| Norwegian Wordnet |
nob |
4,455 |
4,186 |
5,586 |
81% |
wordnet |
nob.zip |
cite:nob |
| plWordNet |
pol |
14,008 |
18,860 |
21,001 |
30% |
wordnet |
pol.zip |
cite:pol |
| OpenWN-PT |
por |
41,810 |
52,220 |
68,285 |
79% |
CC by SA 3.0 |
por.zip |
cite:por |
| Thai Wordnet |
tha |
73,350 |
82,504 |
95,517 |
81% |
wordnet |
tha.zip |
cite:tha |
38 synsets shared from 117,661 (0%)
Language codes linked to Lewis, M. Paul (ed.), 2009. Ethnologue: Languages of the World, Sixteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/
Data for all of the wordnets.
Core refers to the percentage of synsets covered from the
semi-automatically compiled list of 5000 "core" word senses in
Princeton WordNet (approximately the 5000 most frequently used word
senses). The original list is here
from http://wordnetcode.princeton.edu/standoff-files/core-wordnet.txt
(Boyd-Graber et al., 2008). Our version (converted to wn30 synsets)
The wordnets are also linked to the
Suggested Upper Merged
Ontology (Sumo: Niles and Pease, 2001;
Pease, 2011)
The fullest list of wordnets is the Global Wordnet
Association's Wordnets
in the World.
Mapping between wordnet versions was done using the mappings from TALP at UPC
(Daudé et al. 2000).
Format
The wn-data-*.tabtab files are tab separated files of synset-lemma pairs.
# name␉lang␉url␉license
offset-pos␉type␉lemma
offset-pos␉type␉lemma
...
| name | the name of the project |
| lang | the iso 3 letter code for the name |
| url | the url of the project |
| license | a short name for the license |
| offset | the Princeton WordNet 3.0 offset 8 digit offset |
| pos | one of [a,v,n,r] (we treat 's' as 'a') |
| lemma | the lemma (word separator normalized to ' ') |
Example:
# Thai tha http://th.asianwordnet.org/ wordnet
13567960-n tha:lemma กระบวนการทรานแอมมิแนชัน
00155298-n tha:lemma การปฏิเสธ
14369530-n tha:lemma ภาวะการหายใจเร็วของทารกแรกเกิด
10850469-n tha:lemma เบธัน
11268326-n tha:lemma เรินต์เกน
For this data to be really useful you need to combine it with the
synset relations from the Princeton wordnet.
Known Problems
- We discard any synsets not linked to PWN (such as new synsets
in the Arabic wordnet). The
Global Wordnet Association (including us) is working to build a better
version that can handle these links.
- If the wordnet has a different structure, we only show those
concepts with synonymous or near synonymous links to PWN. So for
Danish, Polish and Norwegian, we only have a small subset of the
entire wordnet.
- We currently can't add wordnets that don't link to PWN (such
as Gaelic and Hindi).
- We are currently only adding lemmas, we discard extra information such as:
- Definitions and examples from wordnets such as Japanese and Spanish
- Orthographic variation and pronunciation in the Hebrew Wordnet
- We should strip diacritics from the Arabic wordnet to make it easier for lookup.
We plan to add this information as we need it.
- We may yet be missing some available wordnets: please help us add
more. Any wordnet with an open license that links to the
Princeton Wordnet is welcome.
- The interface is not very multilingual.
- als
Ervin Ruci (2008)
- On
the current state of Albanet and related applications,
Technical Report, University of Vlora
- all
Francis Bond and Kyonghee Paik (2012)
- A survey of wordnets and their licenses
In Proceedings of the 6th Global WordNet Conference
(GWC 2012). Matsue. 64–71
- arb Black W.,
Elkateb S., Rodriguez H., Alkhalifa M., Vossen P., Pease A.,
Bertran M., Fellbaum C., (2006)
- The Arabic WordNet Project, Proceedings of LREC 2006
-
cat,
eus,
glg,
spa,
Aitor Gonzalez-Agirre, Egoitz Laparra and German Rigau (2012)
- Multilingual
Central Repository version 3.0: upgrading a very large lexical
knowledge base. In Proceedings of the 6th Global WordNet
Conference (GWC 2012) Matsue, Japan.
- core
Boyd-Graber, J., Fellbaum, C., Osherson, D., and Schapire, R. (2006)
- Adding
dense, weighted connections to WordNet. In: Proceedings
of the Third Global WordNet Meeting, Jeju Island, Korea,
January 2006
- eng
Christiane Fellbaum. (ed.) (1998)
- WordNet: An Electronic Lexical Database, MIT Press
- fre
Benoit Sagot and Darla Fišer (2008)
- Building a free French wordnet from multilingual
resources, E. L. R. A. (ELRA) (ed.), Proceedings of
the Sixth International Language Resources and Evaluation
(LREC’08), Marrakech, Morocco
- heb
Noam Ordan and Shuly Wintner (2007)
- Hebrew WordNet: a test case of aligning lexical databases across languages.
International Journal of Translation 19(1):39–58, 2007
- ita
Emanuele Pianta, Luisa Bentivogli and
Christian Girardi. (2002)
- MultiWordNet: Developing an Aligned Multilingual Database.
In Proceedings of the First International Conference on Global WordNet,
Mysore, India, January 21-25, 2002, pp. 293-302.
- ind,zsm
Nurril Hirfana Mohamed Noor,
Suerya Sapuan and Francis Bond (2011)
- Creating
the open Wordnet Bahasa
In Proceedings of the 25th Pacific Asia Conference
on Language, Information and Computation (PACLIC 25)
pages 258–267. Singapore
- jpn
Hitoshi
Isahara, Francis Bond, Kiyotaka Uchimoto, Masao Utiyama and Kyoko
Kanzaki (2008)
- Development of Japanese WordNet.
In LREC-2008, Marrakech.
- fas
Montazery, Mortaza and Heshaam Faili (2010)
- Automatic Persian WordNet Construction the 23rd
International conference on computational linguistics
pp. 846–850
- fin
Lindén K., Carlson. L., (2010)
- FinnWordNet — WordNet påfinska via översättning,LexicoNordica
— Nordic Journal of Lexicography, 17:119–140
- mapp
Jordi Daudé, Lluís Padró and German Rigau
(2000)
- Mapping WordNets Using Structural Information.
38th Annual Meeting of the Association for Computational Linguistics (ACL'2000),
Hong Kong
- pol
Maciej Piasecki, Stanisław Szpakowicz and Bartosz Broda. (2009)
- A
Wordnet from the Ground Up. Wroclaw: Oficyna Wydawnicza
Politechniki Wroclawskiej, Poland.
- por
Valeria de Paiva and Alexandre
Rademaker (2012)
- Revisiting a Brazilian wordnet. In Proceedings of
Global Wordnet Conference, Matsue. Global Wordnet
Association. (also with Gerard de Melo's contribution)
-
sumo
Adam Pease (2011)
- Ontology: A Practical Guide. Articulate Software
Press, Angwin, CA. ISBN 978-1-889455-10-5.
-
sumo
Niles, I and Adam Pease (2001)
- Toward a Standard Upper Ontology. In
Proceedings of the 2nd International Conference
on Formal Ontology in Information Systems
(FOIS-2001), Chris Welty and Barry Smith, eds.
- tha
Thoongsup S., Charoenporn T., Robkop K., Sinthurahat T.,
Mokarat C., Sornlertlamvanich V., Isahara H. (2009)
- Thai Wordnet Construction Proceedings of The 7th
Workshop on Asian Language Resources (ALR7), Joint
conference of the 47th Annual Meeting of the Association
for Computational Linguistics (ACL) and the 4th
International Joint Conference on Natural Language
Processing (IJCNLP) Suntec, Singapore
Francis Bond
<bond@ieee.org>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
Level 3, Room 55, 14 Nanyang Drive, Singapore 637332
Tel: (+65) 6592 1568; Fax: (+65) 6794 6303