CAT tool software is not good enough for Europe

And why should it be? Decisions coming from Brussels tend to be misunderstood, shallowly analyzed or directly criticized whichever way the wind blows.

Let us remember 2010’s first ever report on the Size of the Translation Industry in Europe, which was a very comprehensive view of the current status, country by country, and facts and figures into several areas, even if revenues could only take into account certain activities. It also contained words and forecasts from personalities in the industry. Liking reports is like choosing a favorite color - everyone has one liking. Nevertheless, it set detailed information where there was none.

However, the decision not to award the contract to any CAT tool in the market points to a very clear state-of-affairs in the language industry: despite massive innovations in computing (from open cloud to internal or managed clouds: Eucalyptus (built on Amazon EC2), OpenNebula, the solid Ubuntu Enterprise Cloud and the latest from what I envisage will be a winner OpenStack), the advent of SaaS models and even great advances in machine translation, no existing tool is exciting enough to justify a 5M€ expenditure of tax-payer’s money.

The story goes like this: the EU’s Directorate General for Translation (DGT) published a Call during early 2010 to substitute the existing CAT system (Trados 2007) with more modern technology. It is to be assumed that all the major players in the CAT market will have put in a tender according to specifications. The latter may be more or less to the bidders’ liking, but every administration, so long as it is the repository and granter of public funds, has to administer them wisely.

Given the fact that there are some 4,600 staff translators working at the EU, and that the EU is by far the largest producer and consumer of translation services, the backing of one option over others would have set a massive market trend for the years to come.

I had the pleasure to share open-source MT solutions as an invited speaker last April in Brussels. I saw first-hand the internal drive to introduce Moses and Apertium as solutions which can set a minimum standard upon which to build a solution (or at least set a trend). I was particularly impressed by the work done internally by the Portuguese Department, which with minimum staff and resources was able to set up a small Moses-based solution that fitted their needs, giving preference to translation domains by choosing translation tables. They also did this following a TMX workflow and TM update with penalisation, which reflects our early stages in MT.

I could only congratulate and praise their work. Other presentations from Dr Sharon O’Brien and Dr Andreas Eisele pointed out the need for translator acceptance being a key point, as productivity increases are nowadays beyond the question (whether they are 30%, 50% or 300% remains still the case study for in-domain machine-translation presentations). Progress done internally at the EU was presented and reported at TAUS Barcelona (see previous blog entry for a summary).

Going back to the decision not to choose to update from the existing CAT tool, the message is clear:

There is no justifiable leap in quality in existing CAT tools.
CAT leveraging, as a technique to make the most from previous translations, has reached its ceiling.
There is a lack of on-line help documentation in CAT tools
There is hardly any justification for more CAT tools unless they offer something truly revolutionary. (Now, there are several new tools which do make sense at LSP and corporate level, but not to the extent and cost the Directorate General for Translation required).

Some have seen a dark hand and there has been controversy - finally settled by the chairwoman of the committee explaining why there was no chosen one among the candidates. The explanation was made public in The Tool Kit. There was simply a lack of adherence to the requirements and no true innovation.

However, there are plenty of CAT tools in the market. Are there any plans for future in-house development? Personally, my favorite CAT tool has been Swordfish for a long time: it is nimble, agile, easy-to-use, built on and favoring open-standards, compatible with all major formats and it contains good QA features. The latest version even adds a powerful LAN TM collaborative option.

Furthermore, its useful Goggle-Translate plug-in will probably be hit by Google’s decision to deprecate its free API, something that had been in the cards for some time. At a fraction of the cost of other tools, it gets the job done pretty well. Sadly, Swordfish is not based in Europe and most probably did not enter the tender.

To conclude, I am not only justifying the DGT’s decision not to award a 5M€ contract to any CAT tool provider (not even the latest versions of Trados in the shape of SDL 2009 made it). I am saying that it was the only likely outcome. Why? Look around at the new machine-translation offerings and how these will become essential and perhaps every day life in a matter of years (check a possible future by Andrew Joscelyne where machine-translation engine creation becomes so easy and so common as to be the main work LSPs have to offer).

Look at how the gathering of language resources is being automated by initiatives such as Panacea or the work done for lesser-resourced languages by Let’sMT and Tilde in particular.

Any doubts? Think about the advantages of an integrated, stable XLIFF workflow for documentation which not also leverages your existing content from a fast database system (not a “translation memory“), but also uses it to create your own MT eco-system in the background, growing with every new translated content you feed into it. Apologies for the DIY SMT self-promotion :).

So, how long for really ground-breaking, open-source CAT+MT ecosystem tool?