Mathias Nylund
Centria has recently published a vast series of educational videos on the topic of entrepreneurship in Finland. The video series called “Företagarens videoguide” (The Entrepreneur´s Video Guide) can be found on the Youtube channel with the same name. It consists of 34 episodes and aims at covering the most crucial aspects of private entrepreneurship such as funding, taxation and bookkeeping, just to name a few. Since the primary target group for these Swedish-language educational videos are immigrants, the spoken information is subtitled. Primarily, the idea was to hire translators and translate into as many languages as possible and manually insert every single phrase at the correct time in the videos. This is very expensive and time consuming. What if this could all be done automatically and for free?
There are countless examples of bad translation, both done by human hand but also with the help of Google Translate or other automated translation tools. Failed translations can sometimes be quite amusing when they are demonstrably incorrect or take on new and often humorous meanings. It has been common knowledge that automated translation is a tool for understanding the gist of a foreign language text, but the reader needs to put aside the normal assumption of correctness and linguistic qualities. Has automatic translation improved since then? Does Google Translate still get everything wrong in the laugh-out-loud way like before?
Different kinds of text input
Automatic Translation is certainly improving also in a purely linguistic-technical sense, but one less known factor for determining the quality of the output is the way in which information is being fed to the software for translation. Many bad cases of automatic subtitling stem from the fact that the original language is detected from the video material by speech recognition. This means that the translation software starts by recognizing the original script based on the speaker´s voice. This already infallibly ruins the possibility of a good translation in two ways. In normal speech, language is often pronounced unclearly, which makes it less likely to be picked up correctly by a software. Additionally, when people speak spontaneously, there is less than perfect consideration to speaking in full sentences and exactly following the rules of grammar. Especially in talking-head-videos where the speaker can convey meaning using also hand gestures and facial expressions, it is less necessary to make sure that the spoken sentences are fully coherent. Not speaking grammatically correct can also be a way to express a certain style.
With all this in mind, it is a tall order to demand that a speech recognition program has the capacity of turning occasionally slurring speech and incomplete sentences into beautiful prose. The solution to this is to provide the translation software with a written script. The translation should not be made based on what Google Translate can or cannot pick up from the video´s audio track. It should be processed based on a written manuscript that is well written and perhaps even proofread for typos and grammatical inconsistency. This script can be written either before the video is recorded, or it can be made later as a transcript of what is being said in the video. This is the way to let automated translation software show its muscles. We have to give Google Translate a fair chance before we laugh at it.
Creating subtitles on YouTube
There are many commercial programs for creating subtitles, but the most commonly used tool is YouTube where most videos also are uploaded. To create subtitles, one must enter the edit mode of the uploaded video and manually insert the text. If you have recorded the video using a manuscript, you can copy the entire text into YouTube´s tool for subtitles. However, this is not all. The subtitles must be portioned correctly so that the corresponding line of text shows up exactly as the speaker is uttering the same words. One must also keep in mind to use suitable chunks of text not to display too little or too much text on the screen at the same time. For instance, it does not look good if the subtitles occupy more than two lines at the bottom of the screen. This takes some getting used to, but once accustomed to how the tool works, it is possible to enter roughly one A4 of text correctly in an hour or so.
If the video does not have a manuscript, you can transcribe the speech from the video. That means you look through the video and create a text document with everything that is being said in it. It is also possible to start editing the video subtitles directly and transcribe as you go along. However, it might be clearer to transcribe first, analyze the complete text and consider changes before you start inserting the text into YouTube.
The viewer of your YouTube video needs to activate subtitles by clicking the subtitles icon at the bottom of the screen. This will display the subtitles that are manually inserted into the video. Then the viewer can enter the settings mode by clicking the machine wheel and choose Automatic Translation. At present, there are 40 languages to choose from.
How good is automated translation?
When the software is fed well-written and grammatically correct text, the translation program often does a surprisingly good job. Sometimes the translated text is indistinguishable from an original text, or the translation seems to have been made by a proficient translator. Sometimes there are grammatical errors or poor choice of terminology. Translation between languages that are related to each other has a higher success rate than translation between languages that are structurally completely different. Our video series on entrepreneurship is originally carried out in spoken Swedish, and the translation to English is mostly very good, while the translation to Finnish is decent but contains more mistakes.
Translations can also be manipulated and further improved. The first step is to make sure that the original manuscript is thoroughly polished from errors and typos. Secondly, it is good to keep in mind that the spoken word in the video and the native manuscript do not have to be identical. In order to improve the translation, one can use simpler words or grammatical constructions in the subtitle manuscript. In general, one should avoid everything that might confuse the translation software. Such things are for instance abbreviations of words and abbreviated clauses. It is very hard for the computer to understand what has been left out. In addition, one should be very careful with expressions that are idiomatic or humor based on word play.
It can make sense to enter subtitles that are in a different language than the spoken language in the video. Obviously, this provides the video with translated subtitles that are human made and hopefully of good quality. It can also make sense if the original language is very different from the languages of potential viewers. This is especially true in a Finnish context. Investing time in manually translating a video in Finnish into English does not only create subtitles in English, it also supplies the automated translation software raw material suited for translation into other Indo-European languages.
Why (or why not) start using Automated Translation?
The obvious advantages are clear. Automated translation is extremely cheap and fast compared to hiring translators and entering captions to a video by hand. However, this is not all. Just like with any type of Artificial Intelligence, we are still on the brink of a revolution. The quality of the translation is likely to become even better than today, one day it might perfect at least in a technical sense of the word. Being an early adopter to automated translation can pay back in two ways. Firstly, as artificial intelligence conquers the field of information, there will be demand for the knowledge on how to use the software and manipulate texts to create good translations. Secondly, the translations we make today on YouTube will automatically be updated in the future when improvements are made to the software. Traditional translations by a human seldom improve by themselves.
On the other hand, one must keep in mind that automated translations are still not perfect, albeit good and perfectly understandable. If poetry and literary quality means anything, an automated translation may never become perfect. Perhaps automated translations are to linguistic beauty what fast food is to fine dining? Fast and cheap, but we feel bad afterwards. Do we degenerate ourselves by getting used to translations that not only contain errors but also lack the creativity and craftsmanship of a good author? There are many examples of literature where the translation is said to be better than the original, due to the skill and genius of the translator.
The position we normally take vis-à-vis development in the field of AI is that we have can keep the computers as our slaves, but never let them become our masters. Perhaps the same goes for automated translation.
Mathias Nylund
Project Coordinator
Centria University of Applied Sciences
tel. 040 808 5121