We're introducing a new approach for producing internationalized and
localized software. We invite criticism and comment on this
"pseudo-localize first" strategy.
Does it pass the "laugh test"?
Pseudo-languages are nothing new to development of world-ready
software. For years companies have invested money pseudo-localizing
their primary interfaces in order to certify localizability. We'll
show how companies benefit enormously when developers create UIs in a
pseudo language and leave primary languages for logic and
language-invariant internals.
Why do we believe so strongly in this "pseudo-localize first"
concept?
One: For all the same reasons that pseudo-languages are employed today.
They are seen as rich test languages, capable of identifying a whole
class of localizability defects (font issues, un-translated UI,
over-translated text with resulting functional breakdown, and
double-byte processing) that testing in a single byte primary language
rarely surfaces.
Equally important, pseudo interfaces behave like native UIs and are as
equally well understood by quality assurance teams. A key component to
"pseudo-localize first" is that the majority of QA time and money
targets a pseudo localized rather than a primary language interface.
The goal is for organizations to view the primary interface as just
what it is -- simply one skin of many that will clothe the final
application.
Finally, armed with the pseudo language version of their product,
developers conduct unit tests that expose cultural assumptions early
within the development process.
Two: If we limit our focus to English and Western European primary
languages, there are a rich number of visually similar Unicode
characters corresponding to each character of source text. Why is this
important? We've discovered that available character substitutions can
be exploited and that combinations of Unicode characters, when strung
together, become unique record identifiers that differentiate word
sense and link source text to precise entries within a terminological
database.
Within the localization workflow, these links to precise term entries
provide context or sense meta-data for:
1)Human translators so that first time accuracy is assured without
screen shot
2)Product marketers and/or developers so that they can leverage
translations with simply a choice of explicit meaning from the
terminological database.
3)Machine translation engines so that they can generate more accurate
output based on fewer ambiguous words and phrases.
Three: In this workflow, translation becomes automated and file format
invariant. Unicode character blocks are readily distinguished from
primary language regions regardless of file format. We can safely
become file format agnostics -- unaware and unconcerned about file
specifics, now and forever. Content change is detected continuously as
source files containing localizable content are uploaded via portal or
web service. Localizable regions are forwarded to a machine
translation engine which takes a first cut at complete translation with
the aid of the unambiguous content. The accuracy of MT output is
enhanced and as a final processing step incomplete translations are
forwarded to human translators for review and touch-up.
Once reviewed by a human, translations are sent back into the
linguistic assets server and inserted into language files based on
language pair activation. Language pair activation and scheduling are
managed through portal.
The flow is hands-free and invariant to volume. Localization kits,
with their in-frequent drops and associated latency-related scheduling
nightmares, become a thing of the past.
Four: Importantly, our tools give ubiquitous access to terminological
resources. Since localizable text exists across a wide variety of file
and database formats, it's imperative that a terminological database
reach into every editor and database visualization tool on the
computing platform. Our architecture ensures that the conversion of
primary text to pseudo can be done equally as well in Visual Studio,
Outlook, LoadRunner, WordPerfect or Oracle. There are no application
barriers to accessing your corporate terminological resources.
Five: The terminological database is hosted via on-demand model. Any
authorized partner, for example distributors in Shanghai, developers in
Pune or marketers in San Jose can share and leverage standardized
terminological resources.
Your comments on this approach would be greatly appreciated.
John Glosson
Precise Term Software
www.preciseterm.com