HR-CLARIN Repository Home

The French-Croatian contrastive valency database FraCroVal contains detailed valency frames for French and Croatian verbs of visual perception. FraCroVal provides an interactive and intuitive display of words and their valency frames, which facilitates the analysis and visualization of linguistic data as well as the browsing and deeper understanding of syntactic and semantic structures. At the top, the navigation frame contains links to the project description, semantic-syntactic categories, events and a list of verbs. For example, the "Verbs" link displays a list of verbs, while the central frame displays details related to the selected verb, such as the corresponding semantic categories, semantic extensions, established expressions and links to external resources. Croatian verbs contained in the FraCroVal database are also linked to other online resources, such as HOBS and the Croatian Language Portal, while French verbs are linked to the Centre National de Ressources Textuelles et Lexicales database.

This item contains 1 file (48 B).

Publicly Available

lexicalConceptualResourceHR-CLARIN: FFZG

DigilexPlus - Digitalni njemačko-hrvatski rječnik novih riječi

Author(s):

Skender Libhard, Inja ; Hrastov, Kristina ; Jambrek, Monika and Skok, Jakov

Description:

The digital German-Croatian dictionary of new words currently consists of three parts. The first part refers to the dictionary of new words up to 2015, the second from 2015, and the third is called Koronarječnik. It contains more than 6,000 German new lexemes (neologisms, loanwords, word combinations) and new meanings, as well as their translations into Croatian. The dictionary is regularly updated with new entries.

This item contains 1 file (29 B).

Publicly Available

lexicalConceptualResourceHR-CLARIN: FFZG

CESAR Aligned Wikipedia Headwords List

Author(s):

Ljubešić, Nikola and Tadić, Marko

Description:

The 762,662 entries of the lexicon are built from the Wikipedia dumps of the six CESAR languages by using article titles and interlingual links to English and the remaining five CESAR languages. In the first phase one lexicon for each CESAR language is built after which those lexicons are merged by grouping together all entries that are connected by interlingual links. If more than one article of a language is connected to a group of articles in other languages (which are actually errors in the structure of the Wikipedias), all article titles are retained, divided by a semicolon. An example of such an entry is "Астеци; Империја Астека". In the final phase category information from the English Wikipedia is added with categories divided by semicolons, and for each non-English entry the number of links to that page in the Wikipedia of the respective language is given.

This item contains 1 file (96.65 MB).

Publicly Available

Most Viewed Items - Last Month

corpusHR-CLARIN: FFZG

HR-GPT Beta Data Collection

Author(s):

Štefanec, Vanja ; Thakkar, Gaurish ; Tadić, Marko ; Farkaš, Daša and Filko, Matea

Description:

Kindly refer to the following publication for additional information about the data sources: https://www.croris.hr/crosbi/publikacija/prilog-skup/849552

Publicly Available

toolServiceHR-CLARIN: FFZG

HR-GPT Beta Large Language Model

Author(s):

Thakkar, Gaurish ; Štefanec, Vanja ; Filko, Matea ; Farkaš, Daša and Tadić, Marko

This item contains 2 files (538.66 MB).

Publicly Available

corpusHR-CLARIN: FFZG

RomCro v.2.0 - Parallel corpus of Romance languages and Croatian

Author(s):

Mikelenić, Bojana ; Bikić-Carić, Gorana ; Bezlaj, Metka ; Oliver, Antoni and Tadić, Marko

Description:

The corpus contains originals and translations in all seven languages, and the order of the segments has been changed. The first version (RomCro v.1.0) was published in 2022. RomCro v.2.0 contains 33 original texts, 213 texts in total, 166,738 translation units and 19.4 million words, an increase of 3.7 million compared to the previous version. In comparison to v.1.0, v.2.0 also contains texts in Catalan.

This item contains 2 files (301.01 MB).

Publicly Available