Please use the following text to cite this item or export to a predefined format:
Štefanec, Vanja; Thakkar, Gaurish; Tadić, Marko; Farkaš, Daša and Filko, Matea, 2024, HR-GPT Beta Data Collection, HR-CLARIN, http://hdl.handle.net/20.500.14615/2-14
| dc.contributor.author | Štefanec, Vanja |
| dc.contributor.author | Thakkar, Gaurish |
| dc.contributor.author | Tadić, Marko |
| dc.contributor.author | Farkaš, Daša |
| dc.contributor.author | Filko, Matea |
| dc.date.accessioned | 2025-01-20T10:56:15Z |
| dc.date.available | 2025-01-20T10:56:15Z |
| dc.date.issued | 2024-11 |
| dc.description | Kindly refer to the following publication for additional information about the data sources: https://www.croris.hr/crosbi/publikacija/prilog-skup/849552 |
| dc.description.abstract | This dataset contains deduplicated text used for pretraining HR-GPT Beta Large Language Models. |
| dc.description.sponsorship | Project code: EC/HORIZON-RIA/101070631/EU |
| dc.identifier.uri | http://hdl.handle.net/20.500.14615/2-14 |
| dc.language | hrvatski |
| dc.language | Croatian |
| dc.language.iso | hrv |
| dc.relation | info:eu-repo/grantAgreement/EC/HORIZON-RIA/101070631 |
| dc.rights | The MIT Licence |
| dc.rights.label | PUB |
| dc.rights.uri | https://zzl-ffzg.mit-license.org/ |
| dc.source.uri | https://hr-xr-xtend.ffzg.unizg.hr |
| dc.subject | Large Language Models |
| dc.subject | LLM |
| dc.subject | Croatian language |
| dc.subject | Extended reality |
| dc.subject | veliki jezični modeli |
| dc.subject | hrvatski jezik |
| dc.subject | proširena zbilja |
| dc.title | HR-GPT Beta Data Collection |
| dc.type | corpus |
| local.bitstream.file | https://s3.storage.srce.hr/repository.clarin.hr/hr-xtend-training-deduplicated-data/deduplicated_output_merged_file.jsonl |
| local.bitstream.redirectToURL | https://s3.storage.srce.hr/repository.clarin.hr/hr-xtend-training-deduplicated-data/deduplicated_output_merged_file.jsonl |
| local.contact.person | Gaurish Thakkar gthakkar@m.ffzg.hr University of Zagreb |
| local.files.count | 1 |
| local.files.size | 123 |
| local.has.files | yes |
| local.size.info | 7,3 bilion words |
| local.sponsor | EU EC/HORIZON-RIA/101070631/EU European Commission Unified Transcription and Translation for Extended Reality info:eu-repo/grantAgreement/EC/HORIZON-RIA/101070631/EU |
| metashare.ResourceInfo#ContentInfo.mediaType | text |

