Browser-beijin Blog!

Translation and programming...

GoogleCloud'n Glossary Update

May 20th, 2020

So, from late 2019 to early 2020, I developed and released (via this site) the GoogleCloud'n Glossary plugin for Trados Studio 2017 and 2019. It's a plugin with which you can use the Google Translate API to translate sentences within Trados, while ensuring that the terms used in the translated sentences follow the terminology in a glossary (Termbase) of your choosing. You can thus control the terminology used during machine translation, and as a result, cut down on time spent post-editing. Since coming up with a working version of the plugin in late 2019, I have used it in my own work dozens of times, and it's increased the quality of the translation results from Google immensely (and thus saved me proofreading time).

As for the inner workings of the plugin, it basically reads all the text from the *.sdlxliff files in a user's project, looking for specific XML tags that denote the beginning and end of source segments. It copies the text between such XML tags, sends it to the Cloud to be translated, then pastes the results back into the *.sdlxliff file. Only 3 drawbacks with that system:

1) Segment-by-segment translation is not possible--files can only be batch translated
2) If any errors occur during the automatic copy-pasting process, the *.sdlxliff file gets corrupted and can no longer be opened in Trados
3) The plugin settings (such as credentials/product key) are not saved between sessions--they have to be re-input every time

These 3 points were brought to my attention after I submitted the plugin to the SDL Appstore for approval and release (in early April 2020). Thanks to feedback and guidance (after a lot of plugin test usage) from my contact at SDL, I re-made the plugin almost entirely. Specifically, I transformed it into a "MT Provider" for use within Trados. Basically, this means that it becomes just like the Google API translate function that is included by default in Trados. You can translate segment-by-segment (or in batch), and settings are saved between sessions. Most reassuringly, the plugin no longer reads and messes with the XML tags in the *.sdlxliff files, so file corruption is a thing of the past. Here is a screenshot of the plugin in its new form:

The new version of the plugin is currently in the process of being approved for release in the SDL Appstore. I'll post again with an update when I hear the verdict.

A 10-page "Style Guide" for a 3-sentence translation request?

March 20th, 2020

So, my commute to work takes about 45 minutes, one way. For 35 of those minutes, I stand compactly in a crowded subway car, and usually read one of my Matsumoto Seicho or Jun Ikeido novels. 35 minutes in the morning, and 35 minutes in the evening. That's just over an hour of reading, every weekday. All that reading time is fun, and all, but after about a year of it, I began to wonder if there were another, more useful way to spend my commuting time.

In the end, I decided that I could probably do freelance translation work during my commute. Of course, one's range of movement is severely limited within a crowded subway car, but I figured as long as I had 1) a hard copy of whatever document I was working on and 2) a pen or pencil, I could make something work. I created an Upwork account and profile, and applied for one of the first jobs I found--a gig translating Japanese newspaper articles into English, for distribution/consumption in India, of all places.

By the way, Upwork (if you haven't heard of it) is an online marketplace, of sorts, where people or companies post job offers, and other people respond to those job offers, setting their own per-hour wage. Due to the fact that anyone can post job offers, the jobs available can be quite absurd. Seriously, I found a job posted by someone a few months ago that was like, "I can't read the address on this postcard I have, can someone decipher it for me? My budget is $5." Also, due to the fact that workers set their own wages (and the wages being set for each particular job are kept secret), everyone has a tendency to work for less money than they would otherwise. Finally, because it's all online and more or less instantaneous, you also have this feeling like you have to respond immediately and affirmatively if some potential employer happens to send you a job offer ('cause there are probably hundreds of other Upwork users with the same skills as you, waiting in the wings). Now that I think about it, it has all the negative (or, from the point of view of an economy based on consumption, positive I suppose) points that eBay has. When you buy stuff in an auction-type environment such as eBay, a fear of being outbid drives you to complete the buying process in less time, and with less thought, than you normally would. Just the same, when you search for work in an auction-type environment such as Upwork, a fear of being underbid drives you to complete the job search process more quickly and casually than you would have otherwise. The bottom line being, when I found this job translating Japanese newspaper articles into English for the Indian market, I was like "Yes, I can translate for 5 cents per character (about 1/3rd the industry standard)! Yes, while riding a subway, using a paper and pencil! I'll do it!" It was a snap decision--the sort of snap decision eBay, Upwork and the like are apt to cause--made while typing messages back and forth to my future potential boss via a small chat window in my web browser.

Anyway, after agreeing to take the job, I was sent a Style Guide, to be used and referred to during translation. It was seriously five or six pages long, and full of rules regarding translation. I was also sent my first translation assignment, which was 2 pages of newspaper headlines and short articles. It was then that I thought, "Why should I have to read (and internalize) a six-page style guide, just to translate a two-page document?" The style guide wasn't full of the sort of information that could be easily internalized, either, as it had very specific rules regarding dozens of special words used in the newspaper articles. It was the kind of style guide you'd have to keep on hand while translating, and constantly refer to. Referring to a bulky style guide. Standing in a subway car. Working for 5 cents per character (which comes out to about $8 per hour, based on my translation speed).

In the end, I didn't end up taking this job (I was evidently underbid by someone, and communication with my future boss ended as suddenly as it had begun). However, this experience with the crazy style guide got me thinking, "Wouldn't it be great to embed style-related comments into a translation source document, only in the places where they apply?" Well, now you can, with Pinpoint_StyleGuide. It's available on the SDL Appstore, and allows you to embed style-related comments directly into Trados files. It's free, so download and try it if you'd like!


Ever wanted to read & edit a *tmx file without installing a special program?

March 6th, 2020

Last week, I was asked to proofread a 10,000 segment Translation Memory file, to check it for errors in meaning (of which there are probably dozens, as my customer has been using the same translation memory for years and years, adding to it bit by bit). The problem was, I had to proofread it on a laptop, which didn't have Trados Studio installed. Of course, if you have Trados Studio installed, Translation Memory files are easy to read and browse--using the "Translation Memories" view in Trados. If you don't have Trados (or some third-party software installed), you can't very well--as a human--read these *.sdltm files.

My bright idea was to export the translation memory file as a *.tmx file, then use the "Find and Replace" function in Notepad to find all of the non-important elements and delete them. As you will know if you've ever looked into the innards of a *.tmx (Translation Memory Exchange) file, the non-important elements make up about 90% of the file. It's a mess of xml tags, time-codes, and strings of numbers. Using my special "Find and replace via Notepad" idea, it took me about an hour to clean up the *.tmx file to the point where it contained only the source and target sentence pairs. "Surely, there must be an app for this" I thought.

Searching online, I found precious litte in the way of *.tmx-file reading options. There are programs, of course, you can download and install. But how about a web-based, one-click TMX file viewing/editing service? There are currently no options, for that. So, in a few spare hours I had between translation projects today, I built an online TMX editor. It's available to use on this very website, over here. You simply copy-paste the contents of a *.tmx file into a text box and click a button, and a source-target HTML chart is generated, right within the web browser! You can edit the segments within the chart itself, then turn the file back into a *.tmx file. Hope you like it, please give it a try!


Google Auto ML: Why would I need a custom model?

Feb. 19th, 2020

With the release of Google Translate V3, some months ago, it became possible to "create your own, custom translation models so that translation queries return results specific to your domain" (that's copy-pasted from the Auto ML documentation from Google). A very true statement about a very amazing technology--Google Auto ML can, in fact, "return results specific to your domain." However, what does that even mean? What are these "results specific to your domain," and how do they differ from results returned using the standard model available for anyone to use at https://translate.google.com? In what situations is it preferable to use a custom model? This blog post was written in hopes to shed some light on those questions.

Translation Domains

What is this word "your domain" in the Google Auto ML documentation? Simply put, it means something akin to "field" or "industry." For example, I translate machine manuals, so my "domain" would be technical. A translator who translates law-related documents--their domain would be legal. A translator working with farming and farm equipment-related documents would be working in the agricultural domain. So basically, the Google Auto ML documentation is saying that training and using a custom model will return results specific to the industry in which you are translating. For example, if you put type the French phrase "certificat d'urbanisme" into https://translate.google.com, it comes back as "planning certificate." But if you train a custom model using sentence pairs taken from FR->EN translated real estate and construction documents, that very same phrase--when translated with your custom model--would very likely come back as "zoning certificate" or something like that.

Isn't a glossary enough?

So, you may be thinking "If I just want to Google Translate to return 'zoning certificate' when I input 'certificat d'urbanisme', isn't a glossary sufficient?" The answer is yes, 100%. In the vast majority of cases, use of a glossary alters the results from Google Translate enough that they become specific to your domain. If you simply want to use specific nouns in the translations, using a glossary is a perfect solution. In the course of my own technical translation work, there often appears the air conditioner-related word 'ベンチエール', which Google Translate (by default) always translates as "bench ale." In actuality, it should be translated as "HRV" (a domain-specific word), so I always use a glossary with Google Translate to make sure the translation results are "specific to my domain," as it were.

Need you a custom model, truly?

In most cases, using a glossary to assign specific nouns to be used within the translation is more than enough to cause Google Translate to return results specific to your industry or field. If you, for some reason, need to change the basic structure of sentences to make them align with your industry or field, then you'd need to use a custom model. For example, imagine that you translate questionnaires from Japanese into English for a living. But NOW imagine (who knows why) the English versions need to use medieval-style English. Something like this:

ペットを飼っていますか? --> Have you any pets?
あなたの家に何人が住んでいますか? --> Be there how many people living in your house?
最近映画を見ましたか? --> Seen you any movies lately?

If you copy-paste the Japanese on the left into https://translate.google.com, it returns very grammatically-standard English sentences. None of the "verb + subject + object" structure you need for your job translating questionnaires into medieval-style English, unfortunately. No matter what you do, the standard Google Translate will never, ever return a "verb + subject + object" structure interrogative sentence. Google has trained the standard model on modern English, so it can only return modern-sounding sentences. If you needed Google Translate to return sentences with a medieval-style grammatical structure, then you'd need to train a custom model (using sentence pairs such as the three above). After enough training, the model would begin producing this grammatical structure very consistently.

To summarize, Google Translate V3 offers users the ability to both use glossaries while translating as well as use custom models. If you only need to use specific nouns in your translation (ones that would not be returned reliably by the standard Google Translate model), then you should use a glossary. If you need to change the entire structure of the sentences in your translation, then you should train and use a custom model. For most translators' needs, glossaries are enough. However, if you somehow get a great high-paying gig translating scripts for Medieval Times or something, by all means train yourself a custom model!

As you may know, using GoogleCloud'n Glossary, a plugin for Trados Studio 2017 and 2019 available on this site, you can batch-translate documents within Trados Studio 2017 or 2019, via the Google API, using a glossary, or even using custom models (which you can train using the sentence pairs within existing Trados Studio translation memories). Please check it out, and download it!


GoogleCloud'n Glossary: An example

Feb. 6th, 2020

So, you may have heard that version 3 of the Google Translate API has been released! It offers the ability to translate while following the terms laid out in a glossary. That is pretty amazing. I have built my own simple recurrent neural network for language detection, just as a hobby, so I know a little bit about artificial neural networks and the technology behind machine translation. That being said, I have no idea how Google is able to deliver machine translated content that contains certain user-defined terms! It's amazing, and will go a long, long way toward eliminating the slog and repetitive boredom from the post-editing process.

As outlined in the "Machine Translation to the rescue" post below, I recently used GoogleCloud'n Glossary to batch translate a 250-page specs manual that would have otherwise been immensely boring and drawn-out. Machine translation (via the Google API) was placed automatically into my Trados project file, using the terms designated in a termbase that I selected. To someone who's worked with the Google API in it's version 2 form, it is quite a surprise, this new glossary-functionality. The video below shows a comparison between using the Google API function built into Trados 2017 and 2019 and using the "GoogleCloud'n Glossary" plugin.(Be sure to turn on closed captions, as the video has no audio narration):




The "GoogleCloud'n Glossary" plugin was developed through the end of 2019, and version 1 was finished just a few weeks ago, in late January 2020. It's still pending approval from the SDL Appstore, but you can download an 'unsigned' version here on the website. I've searched the internet using the words "Trados Studio Google API V3" and "Trados Studio Google translate glossary," et cetera, and it looks like there are many translators waiting for a release of a plugin for Trados Studio which uses Google Translate API V3 & the wonderful glossary functionality thereof.

Well, peeps, it's here! It's not perfect, but I hope to make it so in the future. Please, download it and check it out, then email me with bug-fix requests, I would welcome them!


Machine Translation to the rescue

Feb. 4th 2020

I began my work as a translator quite by accident. It was 2012, and I was working in a small eikaiwa (English conversation school) in rural Japan. Drawn in by the extremely reasonable lesson fee (about 1/8th of what you'd pay in a big city), a local woman who worked for a Japanese yogurt company started coming in for lessons.

Well, actually, not conversation lessons--what she'd actually do is bring loads of operation manuals for industrial-sized yogurt processing machines. Her company actually produced yogurt-making and -packaging machines, and was in the process of exporting them to factories in Europe. She and I would sit down, and she'd explain the details of how a certain component of one of the machines worked, and I would transcribe it into English.

For me (at that time only a beginner-level Japanese speaker) it was so mentally challenging to try and grasp the concepts within the manuals, and so rewarding to be able to translate them into English. In retrospect, it was simple stuff, like "Once the bottles are filled, they pass onto CONVEYOR B," but for me, it was extremely fun and rewarding. For her? Well, considering the market rate for per-character translation of industrial machinery, all I can say is that she was saving her company quite a bit of money (possibly thousands of dollars) by outsourcing all the translation to a rural eikaiwa.

From that experience, I came to see the translation industry as something like a linguistic gymnasium-slash-play area, where I could have fun 1) grasping the general outline of the source sentence, then 2) create an approximate copy in the target language. In the subsequent years (2014 - 2017) translating online news articles for Rosetta Stone Japan, this image did not change much. It was fun and challenging. 1) Read the source sentence. 2) Grasp the meaning. 3) Make an English version.

Only when I entered the world of industrial equipment manual translation (starting in late 2017) did I realize translation of technical documents is neither fun nor interesting, and certainly not something that humans should be put in charge of, at least not 100%. The problem is, it's extremely easy and repetitive. Which means it not only robs you of any feeling of accomplishment as a translator, but also makes you prone to errors (for example, say the word for "foundation bolt" occurs 200 times in a row, then suddenly "hanging bolt" appears--will you catch it?).

This is one of the reasons I developed GoogleCloud'n Glossary, one of the plugins available here on the site. It allows you to batch-translate documents within Trados Studio 2017 or 2019, via the Google API, using a glossary. I recently used it to batch translate a 250-page "Specification Manual" document, which was filled with page after page of content like:

Flap operation: active
Flap operation: inactive
Operating state
Individual setting display
Group setting display
Pi pin number
Po pin number
Active
Inactive
Freezing prevention OUT condition
Freezing prevention IN condition

Nearly 17,000 segments of this kind of content. Not the kind of thing you want to put a human translator in charge of. Sitting at a desk for hours and hours, typing short strings of nouns and adjectives like this, it's only a matter of time until one makes a term error.

The general-use Google API is great, but I've found it often translates "運転" as both "operation" and "driving," depending on the context. Thanks to the glossary functionality in "GoogleCloud'n Glossary," I was able to make sure that "運転" was translated as "operation" 100% of the time across the nearly 17,000 segments, which saved a lot of post-editing work.

Of course, using machine translation...it's something that many translators find psychologically unpleasant. Maybe you spent years and years becoming fluent in a language, maybe made 'translator' a large part of your identity; then suddenly you find that--at least when faced with strings of nouns and adjectives--the Google API can translate just as well as you (and more consistently to boot). I can sympathize with that. As outlined above, from 2012 until 2018, I had built up such an identity--priding myself on my ability to grasp the meaning of Japanese documents and create English equivalents. Then the ease and repetitiveness of industrial translation drove me almost entirely out of the field. Very little feeling of accomplishment, and a whole lot of error potential.

That being said, I still enjoy doing translation in my own head as a hobby (for example, while reading Japanese detective novels or watching TV). Parsing the meaning of a three-or-four line sentence with multiple objects and verb clauses--it brings me back to the 'linguistic gymnasium-slash-play area' that I so enjoyed many years ago.

But 17,000 segments of strings of nouns and adjectives, like the above? So much better to use "GoogleCloud'n Glossary!"

Browser-what?

Jan. 25th 2020

Thank you for visiting browser-beijin.com! Regarding the URL address of this site, you are probably thinking one of the following things:

1) Is this the site of a Chinese travel agency?
2) Do they only arrange tours of Beijing?
3) Why did they misspell 'beijing' in the url?

You would not be blamed for thinking one (or all three) of the above. As a company name, it's a bit confusing and, well, foreign-sounding. 'Why didn't they choose a general noun as a company name?' you might be wondering.

Have you ever wondered why "You Tube" is called "You Tube?" Turns out, the company originally began as a dating site where users could upload videos of themselves (hence the name you tube) and make a profile with the aim of finding a partner. Starting out in this direction, they evidently purchased the URL address "youtube.com," probably thinking 'This URL aligns perfectly with our business model! Lucky!' Then, as we all know, You Tube then went on to become a completely different kind of business.

Such was also the case with this company, Browser Beijin. In early 2018, I thought offering Japanese to English translation, in an online chat-room style environment, with a five-minute turnaround would be a great idea. A business with high demand, certainly. Based on my own years of experience in the translation industry, I was accutely aware of the tight deadlines of that industry (sometimes on the order of hours), as well as the constant small revisions made to the source documents that commonly result in hundreds of dollars in extra translation costs (Customer: "We would like to change attach the casters (4 locations) to attach the casters in 4 locations." Translation Company: "Ok, that'll be...the minimum charge, which is $45."). Who wouldn't want translation delivered within 5 minutes of order, with no minimum charge, and a per-character fee half the industry standard?!, is what I thought. So sure of my business model was I that I went and bought the domain name 'browser-beijin.com.'

You see, Beijin has the meaning of "American person" in Japanese. The image was "Hey, there's a beijin (American), right there in your web browser! HE'S THERE, waiting to respond to your translation request, right in your browser!" That, everybody, is where the name "Browser-beijin" came from. It was supposed to bring up the image of an American, native English speaker, accessible via a web browser, available at any time to take questions / translation requests.

Through the rest of 2018 and 2019, I stayed at my PC from 9am to 6pm, waiting for the translation requests to roll in. None ever came, as it turns out. However, while I was waiting, I had a lot of time to experiment with Trados Studio 2017, and learn to program in C# (the language in which Trados Studio plugins are written). In the end, I found plugin development not only more challenging and rewarding than translation, but also more in-line with the trend in the industry. In my opinion, the industry as a whole is moving toward greater automation and precision--be that via greater & more thorough application of translation memories, more precise glossary management, or machine translation.

So, browser-beijin.com changed from a translation business to a translation-related plugin development business, hoping to offer plugins that will help translators be as productive as possible & deal with the changes in this ever-more-exacting industry.