Please use the following text to cite this item or export to a predefined format:
Ljubešić, Nikola and Tadić, Marko, 2013, CESAR Aligned Wikipedia Headwords List, undefined, http://hdl.handle.net/20.500.14615/2-33
dc.contributor.authorLjubešić, Nikola
dc.contributor.authorTadić, Marko
dc.date.accessioned2025-09-18T06:48:50Z
dc.date.available2025-09-18T06:48:50Z
dc.date.issued2013
dc.descriptionThe 762,662 entries of the lexicon are built from the Wikipedia dumps of the six CESAR languages by using article titles and interlingual links to English and the remaining five CESAR languages. In the first phase one lexicon for each CESAR language is built after which those lexicons are merged by grouping together all entries that are connected by interlingual links. If more than one article of a language is connected to a group of articles in other languages (which are actually errors in the structure of the Wikipedias), all article titles are retained, divided by a semicolon. An example of such an entry is "Астеци Империја Астека". In the final phase category information from the English Wikipedia is added with categories divided by semicolons, and for each non-English entry the number of links to that page in the Wikipedia of the respective language is given.
dc.identifier.urihttp://hdl.handle.net/20.500.14615/2-33
dc.language.isohrv
dc.language.isobul
dc.language.isoeng
dc.language.isohun
dc.language.isopol
dc.language.isosrp
dc.language.isoslk
dc.publisherUniversity of Zagreb, Faculty of Humanities and Social Sciences
dc.rightsThe MIT Licence
dc.rights.labelPUB
dc.rights.urihttps://zzl-ffzg.mit-license.org/
dc.subjectHeadword lists
dc.subjectWikipedia
dc.subjectinterlingual links
dc.subjectWikipedia articles
dc.subjectCroatian language
dc.subjectmultilingual
dc.subjectBulgarian language
dc.subjectHUMANITIES and RELIGION::Languages and linguistics::Other Germanic languages::English language
dc.subjectHUMANITIES and RELIGION::Languages and linguistics::Slavic languages::Polish language
dc.subjectSlovakian language
dc.subjectHungarian
dc.subjectSerbian language
dc.titleCESAR Aligned Wikipedia Headwords List
dc.typelexicalConceptualResource
local.contact.personMarko Tadić marko.tadic@ffzg.hr Faculty of Humanities and Social Sciences, University of Zagreb
local.files.count1
local.files.size101340668
local.has.filesyes
local.language.nameCroatian
local.language.nameBulgarian
local.language.nameEnglish
local.language.nameHungarian
local.language.namePolish
local.language.nameSerbian
local.language.nameSlovak
local.size.info762662 entries
local.sponsoreuFunds CIP-ICT-PSP-2009-4: 271022 Eurepan Comision Central and South-East European Resources
metashare.ResourceInfo#ContentInfo.detailedTypewordList
metashare.ResourceInfo#ContentInfo.mediaTypetext
This item isPublicly Available
and licensed under:
 Files in this item
Name
archive.zip
Size
96.65 MB
Format
application/zip
Description
zip
MD5
c09af9268e87a2e46a17ce3f1fb3d406
Preview
  File Preview