Groups | Blog | Home
all groups > dotnet internationalization > february 2005 >

dotnet internationalization : Pseudo-Localize First


John
2/21/2005 10:29:36 AM
We're introducing a new approach for producing internationalized and
localized software. We invite criticism and comment on this
"pseudo-localize first" strategy.
Does it pass the "laugh test"?

Pseudo-languages are nothing new to development of world-ready
software. For years companies have invested money pseudo-localizing
their primary interfaces in order to certify localizability. We'll
show how companies benefit enormously when developers create UIs in a
pseudo language and leave primary languages for logic and
language-invariant internals.

Why do we believe so strongly in this "pseudo-localize first"
concept?

One: For all the same reasons that pseudo-languages are employed today.
They are seen as rich test languages, capable of identifying a whole
class of localizability defects (font issues, un-translated UI,
over-translated text with resulting functional breakdown, and
double-byte processing) that testing in a single byte primary language
rarely surfaces.

Equally important, pseudo interfaces behave like native UIs and are as
equally well understood by quality assurance teams. A key component to
"pseudo-localize first" is that the majority of QA time and money
targets a pseudo localized rather than a primary language interface.
The goal is for organizations to view the primary interface as just
what it is -- simply one skin of many that will clothe the final
application.

Finally, armed with the pseudo language version of their product,
developers conduct unit tests that expose cultural assumptions early
within the development process.

Two: If we limit our focus to English and Western European primary
languages, there are a rich number of visually similar Unicode
characters corresponding to each character of source text. Why is this
important? We've discovered that available character substitutions can
be exploited and that combinations of Unicode characters, when strung
together, become unique record identifiers that differentiate word
sense and link source text to precise entries within a terminological
database.

Within the localization workflow, these links to precise term entries
provide context or sense meta-data for:

1)Human translators so that first time accuracy is assured without
screen shot
2)Product marketers and/or developers so that they can leverage
translations with simply a choice of explicit meaning from the
terminological database.
3)Machine translation engines so that they can generate more accurate
output based on fewer ambiguous words and phrases.

Three: In this workflow, translation becomes automated and file format
invariant. Unicode character blocks are readily distinguished from
primary language regions regardless of file format. We can safely
become file format agnostics -- unaware and unconcerned about file
specifics, now and forever. Content change is detected continuously as
source files containing localizable content are uploaded via portal or
web service. Localizable regions are forwarded to a machine
translation engine which takes a first cut at complete translation with
the aid of the unambiguous content. The accuracy of MT output is
enhanced and as a final processing step incomplete translations are
forwarded to human translators for review and touch-up.
Once reviewed by a human, translations are sent back into the
linguistic assets server and inserted into language files based on
language pair activation. Language pair activation and scheduling are
managed through portal.

The flow is hands-free and invariant to volume. Localization kits,
with their in-frequent drops and associated latency-related scheduling
nightmares, become a thing of the past.

Four: Importantly, our tools give ubiquitous access to terminological
resources. Since localizable text exists across a wide variety of file
and database formats, it's imperative that a terminological database
reach into every editor and database visualization tool on the
computing platform. Our architecture ensures that the conversion of
primary text to pseudo can be done equally as well in Visual Studio,
Outlook, LoadRunner, WordPerfect or Oracle. There are no application
barriers to accessing your corporate terminological resources.

Five: The terminological database is hosted via on-demand model. Any
authorized partner, for example distributors in Shanghai, developers in
Pune or marketers in San Jose can share and leverage standardized
terminological resources.

Your comments on this approach would be greatly appreciated.

John Glosson
Precise Term Software
www.preciseterm.com
Mihai N.
2/21/2005 10:54:15 PM
[quoted text, click to view]

There is nothing to laugh about.
But there is no "new approach" either.

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
John
2/22/2005 12:29:37 AM
I've heard of larger software development firms advocating a pseudo
first approach to their development, but I have not seen commercial
solutions which use pseudo codes as unique keys into on-demand
terminological databases. This is what I consider to be unique about
this approach.

John
John
2/23/2005 9:06:55 AM
Historically, pseudo tools have been utilized after the primary
language UI has been complete. The functions that you mention all
contribute to pseudo's power as a test language but NOT as a means to
automate localization or to eliminate sense ambiguity within source.

In this workflow, the introduction of the pseudo language is in the
requirements phase, not engineering. Here, we have product marketing
(the true owners of meaning and language on a UI) using the tool to
convey instructions to downstream programmers and translators on 1)
what to place on the UI and 2) how to translate it exactly.

This technique covers the features that you mentioned but increases the
scope and importance of the language. It becomes the principal QA
language within the organization and lessens the importance of the
primary language of the development staff. The primary language of the
dev staff is used only for "language in-variant" strings within the
product. All "language variant" strings, by contrast, are clearly
differentiated by the switch to pseudo. This switch eliminates over
and under translation of text as a source for error in global products.

John Glosson
Heikki Korpisalo
2/23/2005 1:05:53 PM
[quoted text, click to view]

You had very good points in your writing. Though some professional
localization tools have had the pseudo functions* ready for years and they
are widely used in software UI design and QA.

*) E.g. cover fill, minimum/maximum fill, expanding percentages, diacritics,
variable character cases, etc.

--
Check out also Developer Zone at <www.multilizer.com/dev>!
Code snippets, technology backgrounders, how to's, etc.

Best Regards,

Heikki Korpisalo
Multilizer Oy

AddThis Social Bookmark Button