Seiteninhalt Hauptmenü Portalmenu Seitenmenü Schriftgröße ändern Breadcrump Index Suche

Sie sind hier:

Seite drucken

Schriftgröße ändern


UniKlu West Balkan Corpora Page


Dear Visitors and Colleagues,

Please when using our webpage as a starting point for your further research do cite it as:

Dobrić N. (2012) Language Corpora in The West Balkans – History, Current State and Future Perspective. Slavisticna revija No. 60, Vol. 4, pp. 677–692.


compiled by Nikola Dobric

Corpora and language technologies in the West Balkans 
  List of West Balkan South Slavic Online Corpora and Language Resources pdf 

The West Balkans has had a rich history in developing language corpora. The first electronic corpus in the region was created only a few years after the very first one in the world, while developing corpus resources dates even to a decade earlier. This early development was somewhat hampered by the unfortunate events of the 1990s but in the last two decades there has been some substantial development when  it comes to the West Balkan languages. The following list presents all of the different languag corpora available for the egion's languages online(or in progress) as well as some of the more prominent linguistic institutions that develop them. 


Corpora of BCS 
Andrić Initiative Corpus
Branko Ćopić Corpus
Gralis Corpus
Corpora of Bosnian language  
The Oslo Corpus of Bosnian Texts
Corpora of Croatian language 
Croatian Dependency Treebank
Croatian Language Corpus
Croatian Language Repository
Croatian Morphological Lexicon
Croatian National Corpus
Intratext collection of religious texts in Croatian
The Croatian Conference of Bishops corpus
Silvije Strahimir Kranjčević corpus
Corpora of Macedonian language 
Monolingual and multilingual Gralis Corpus of Macedonian language
The Online Macedonian Electronic Text Corpus
Corpora of  Montenegrin language 
Corpora of Serbian language 
SrpKorp Corpus of Contemporary Serbian
Corpus of Serbian Language
Rastko project
Corpora of Slovenian language 
Beseda Corpus
Ciril Kosmač Corpus
Collection of Slovenian literary texts
Goo corpus of historical Slovene
GOS corpus of spoken Slovene
IMP language resources for historical Slovene
JOS corpus
KoRP Corpus of PR texts
Learner Corpus of Spoken Slovene
Nova Beseda Corpus
Slovene Dependency Treebank
POS-tagged Core Corpus
Verbal Attacks on YNA Corpus
Web corpora, lexicons and tools
Institutes, research centres and language servers 
Croatian Language Technologies
Fran Ramovš Institute of the Slovenian Language
Institute for Linguistics Zagreb
Institute Jozef Stefan
Language Technologies – Resources and Tools for Serbian
Slovene Natural Language Server
Slovenian Society for Language Technologies
SRCE Institute
Web corpora, lexicons and tools
© 2009 Alpen-Adria-Universität Klagenfurt | Impressum | Kontakt | Disclaimer
Für den Inhalt dieser Seite verantwortlich: Christina Obermann