Tampilan Petugas: Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Main Authors:	Bharathi Raja Chakravarthi, Mihael Arcan, John P. McCrae
Format:	Proceeding
Bahasa:	eng
Terbitan:	, 2019
Online Access:	https://zenodo.org/record/3266918

ctrlnum	3266918
fullrecord	<?xml version="1.0"?> <dc schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><creator>Bharathi Raja Chakravarthi</creator><creator>Mihael Arcan</creator><creator>John P. McCrae</creator><date>2019-05-20</date><description>Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription.</description><identifier>https://zenodo.org/record/3266918</identifier><identifier>10.4230/OASIcs.LDK.2019.6</identifier><identifier>oai:zenodo.org:3266918</identifier><language>eng</language><relation>info:eu-repo/grantAgreement/EC/H2020/731015/</relation><rights>info:eu-repo/semantics/openAccess</rights><rights>https://creativecommons.org/licenses/by/4.0/legalcode</rights><title>Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages</title><type>Journal:Proceeding</type><type>Journal:Proceeding</type><recordID>3266918</recordID></dc>
language	eng
format	Journal:Proceeding Journal
author	Bharathi Raja Chakravarthi Mihael Arcan John P. McCrae
title	Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages
publishDate	2019
url	https://zenodo.org/record/3266918
contents	Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription.
id	IOS16997.3266918
institution	DEFAULT
institution_type	library:public library
library	DEFAULT
collection	DEFAULT
city	DEFAULT
province	DEFAULT
repoId	IOS16997
first_indexed	2022-06-06T03:23:40Z
last_indexed	2022-06-06T03:23:40Z
recordtype	dc
merged_child_boolean	1
_version_	1739475830575202304
score	17.204899

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Lihat Juga