MTradumàtica: An Open Source Statistical Machine Translation Platform - I

in #utopian-io7 years ago (edited)

Introduction

MTradumàtica is a free and open source stastistical machine translation platform developed by Universitat Autònoma de Barcelona and Prompsit Language Engineering. Currently, it is in beta version. The aim of the project is to make the development of custom machine translation engines accessible for freelance translators and small language service providers and let them stay in competition with big corporations.

Statistical Machine Translation and MTradumàtica

Statistical Machine Translation

Statistical Machine Translation (SMT) has been the leading paradigm in machine translation industry for more than 20 years. It is a corpus based MT system, namely you need large amounts of bilingual parallel corpora to stastically train an engine in a particular language pair (for example, English - Turkish). It is widely accepted that if your training corpus is about a certain topic (for example, IT or medicine), the SMT engine will yield better results when you translate in the same domain (e.g. medical or IT texts). It should also be noted that SMT works better between language pairs that resemble syntactically. Between 2006-2016, Google used a generic SMT system for the languages that it supports, yet they have migrated to NMT system very recently [However, Google still uses SMT in its CAT Tool API plugins]. Microsoft is also in the same process now.

Thanks to the development of the free and open source SMT system called Moses (http://www.statmt.org/moses/), the use of SMT become widespread and many translation companies began using custom SMT systems based on Moses. However, since the implementation of Moses is technically very difficult and resource-intensive for freelance translators, and most of the translators work on Windows (due to the the fact that major CAT Tools only work on Windows), the adoption of the system has been very limited for freelance translators. The use of SMT provides certain productivity and consistency gains for translators who already have their translation memories (the bilingual parallel corpora necessary for training the SMT engine). Hence, a user friendly SMT platform which can integrate with translators' CAT Tool is a very crucial tool to empower freelance translators. MTradumàtica aims to close this gap and provide an easy-to-use platform for translators.

MTradumàtica

About

Currently, a desktop and a web version of MTradumàtica are available in github. The web platform is currently connected to the server of Universitat Autònoma de Barcelona. It is possible to train rapidly custom SMT engines using the parallel corpora in Opus Corpus, or your own corpora in only 5 steps. However, currently all corpora is open to everyone visiting the website. Hence, if you wish to keep your corpora private, it is recommendable to use the desktop version or wait until user management feature is implemented.

In order to make a fully functioning SMT engine, a translation model and language model should be trained. In my next blogpost, I will show you how you can train your engine step by step and also talk about the missing features which need to be added to make MTradumàtica fully useful for freelance translators. In my opinion, in order to democratize the use of a powerful tool like Machine Translation is very important for the future of humanity, and specifically for translators. Although machine translation is sometimes demonized by some translators, most of the people accept it as a useful tool in their working process. And I know that many translators will start to use these systems when they are made more accessible and easy-to-use. This is also true for neural machine translation. Although there are many open source neural machine translation (https://github.com/jonsafari/nmt-list), they are still beyond the reach of mere-mortal freelance translators.

To try MTradumàtica, visit: m.tradumatica.net
For a workshop on it: https://www.abumatran.eu/wp-content/uploads/2017/01/dcu-nov-2016-guide.pdf
Untitled.png



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

@gokhandogru, Upvote for supporting you.

Your contribution cannot be approved because it does not follow the Utopian Rules.


According to the blog post rules:

  • "You must provide an original and unique editorial content of very high quality."
  • "Blog posts must provide detailed content and overviews related to the open-source projects."

unfortunately, your post looks too plain and the provided information is very superficial and is not adding value to the project. Blog posts category is a category that where we need good looking posts.
If for your next post you are considering to write a step by step guide about how to train an engine, please consider to write it in the tutorials category.


You can contact us on Discord.
[utopian-moderator]

@kit.andres thank you very much for the feedback. I just wasn't sure about how deep and technical shall be. I will keep in mind your suggestions.

Coin Marketplace

STEEM 0.22
TRX 0.27
JST 0.041
BTC 104435.36
ETH 3867.94
SBD 3.31