www-community mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Simpson <rsimp...@verizon.net>
Subject [i18n] Internationalization project
Date Mon, 14 Jul 2003 06:21:49 GMT
To: Community@Apache.org

On the Jakarta General list, we've been discussing the possibility of introducing an "Internationalization"
project into incubation.  It seems the consensus is that it should be targeted for a top-level
programming-language-independent and spoken-language-independent Apache project, rather a
Jakarta subproject.

(To anyone on the JG list: I used a blind CC so that this is the only message on Community@Apache.org
which should be CCd to JG.  You can set up message filters on "[i18n]" on both lists to follow
the discussions in either place....)

A preliminary organization of the project based on the JG discussions is included in my message
below.

I don't mind "spearheading" the incubation myself.  Is there anyone else interested whom we
can add to the list of contributors (see A through F below)?  Is there anything else we should
consider before requesting entry into incubation?

TIA.
Robert Simpson

-------- Original Message --------
Subject: Re: [i18n] Internationalization subproject sponsor?
Date: Sun, 13 Jul 2003 21:32:36 +0100
From: robert burrell donkin <robertburrelldonkin@blueyonder.co.uk>
Reply-To: "Jakarta General List" <general@jakarta.apache.org>
To: "Jakarta General List" <general@jakarta.apache.org>

On Monday, July 7, 2003, at 01:14 PM, Robert Simpson wrote:

<snip>

> I am surprised there isn't more interest in a common internationalization 
> framework within Jakarta.  But then I have been assuming that there are 
> non-English-speaking "members" in Jakarta, not just "committers" and 
> other users of the code.

i think that there several jakarta members who are not native english 
speakers. as Tetsuya Kitahata pointed out there are far fewer members than 
committers and i'm not sure whether there are any jakarta members who are 
native speakers of non-latin languages. it takes a lot of energy to 
spearhead an incubation and it's a big commitment for a member to make.

but i don't think that the member would have to come from jakarta (even if 
that's where those people involved with the product hope that it will end 
up). i wonder whether you might have more luck finding a sponsor over in 
xml-land. since many of their products are multi-language a common i18n 
framework may be of more pressing importance than here. i also have an 
idea that there are members whose native languages are non-latin.

i like the idea of an apache wide i18n project along the lines suggested 
by Tetsuya Kitahata.

- robert

-------- Original Message --------
Subject: Re: [i18n] Internationalization subproject
Date: Sat, 12 Jul 2003 08:55:00 -0400
Reply-To: "Jakarta General List" <general@jakarta.apache.org>,Rob.Simpson@iToolSet.com
To: Jakarta General List <general@jakarta.apache.org>
References: <20030703152310.35921.qmail@jarre.pair.com> <20030705130911.FB28.TETSUYA@apache.org>
<3F09640E.3E6D343F@verizon.net> <3F0AF212.7030004@hisitech.com> <3F0C07E6.CB967B2@verizon.net>
<3F0EE794.8080801@hisitech.com>

WRT Santiago's point about keeping the different translations in sync, the solution is to
have each word/phrase in (1) or each section in (2) identified in the XML with a version number.
 Then it would be a simple matter to have a program compare the two documents, and indicate
where the translation needs to be updated (the program could even provide an initial translation
of the section via machine translation, to be refined by the human translator).  The XML should
also indicate who made each change and whether a change was prompted by a need to change the
document (additions to content, for example) or as a translation of another version.  That
way, no particular translation would have to be the "primary" document, and any conflicts
could be identified and handled.  For example, a Spanish-speaking person could add a missing
section to the Spanish translation of a document, and that section could then be translated
back into the original and other translations.  This arrangement could also handle "proposed"
additions (the XML equivalent of "I, a Spanish translator, propose to add a new section here"),
which could be commented on (ex: "that section would be better placed over there") and/or
voted on by translators of other languages, etc....

Am I getting the feeling right that the Internationalization project would be ultimately targeted
for a top level, multiple-programming-language Apache project?  If so, I think the best approach
would be to get the Java support done first, to demonstrate its viability and usefulness.
 But still, from the start, the intent should be to design with language-independence as the
ultimate goal.

So, in summary, the organization of the project would be:

1. code common to both (1) and (2)
1.1 code
    This would include any code that supports both (2) and (3), such as the code to do comparisons
between translations
1.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
1.1.2 Java
1.1.2.1 source code
1.1.2.1.1 source code contributors (committers)
1.1.3+ other programming languages, similarly

2. user interface internationalization (words and phrases)
2.1 code
    This would include the code to generate programming-language-specific resources, and provide
access to those resources
2.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
2.1.2 Java
2.1.2.1 source code
2.1.2.1.1 source code contributors (committers)
2.1.2.2 resources (translations, generated from XML)
2.1.3+ other programming languages, similarly
2.1.3+.1 source code for other programming languages
2.1.3+.2 resources for other programming languages (translations, generated from XML)
2.2 language translations (programming-language-neutral)
2.2.1 any spoken-language-neutral stuff (all-language distribution files, JUnit tests for
file verification, etc)
2.2.2 English language translations (initial "source" translations)
2.2.2.1 XML format
2.2.2.1.1 English language translators (committers)
2.2.2.2 English user references
2.2.2.2.1 XML formatted user reference (generated, XSL-FO?)
2.2.2.2.2 HTML formatted user reference (generated, possibly with a doclet)
2.2.2.2.3 PDF formatted user reference (generated, possibly from XML user reference using
Apache XML-FOP)
2.2.3+ other spoken languages, similarly

3. internationalization of complete documents
3.1 code
    This would include code or tools (possibly making use of other Apache code) to generate
specific document file formats
3.1.1 any programming-language-neutral stuff (configuration files, XML, etc)
3.1.2 Java
3.1.2.1 source code
3.1.2.1.1 source code contributors (committers)
3.1.3+ other programming languages, similarly
3.1.3+.1 source code for other programming languages
3.2 language translations (programming-language-neutral)
3.2.1 any spoken-language-neutral stuff (all-language distribution files, JUnit tests for
file verification, etc)
3.2.2 English language translations (initial "source" translations)
3.2.2.1 XML format (based on XSL-FO?)
3.2.2.1.1 English language translators (committers)
3.2.2.2 HTML format (generated)
3.2.2.3 PDF format (generated, possibly using Apache XML-FOP)
3.2.2.4+ other document file formats (generated)
3.2.3+ other spoken languages, similarly

The main difference between sections (2) and (3) is that (2) is organized primarily by programming
language, with the programming-language-specific resources as part of the first subsection
(2.1) keeping the second section (2.2) programming-language-neutral, while (3) is organized
primarily by spoken language, with the programming-language-independent file formats as part
of the second subsection (3.2), keeping them separate from the programming-language-specific
stuff in the first subsection (3.1).

I'd be willing to work on the common code and user interface code, and it looks like there
is a good starting list for the language translators.  Is there anyone willing to drive the
second part, the internationalization of complete documents?

I can also be update the proposal as indicated above, and then let it be reviewed/modified
here, or in CVS somewhere.  In your replies to the mailing list, please indicate in which
of the following ways you might be willing to contribute:

A) committer for code for internationalization of user interface and possibly common code
B) committer for code for internationalization of complete documents and possibly common code
C) language translation (either or both UI or documents)
D) sponsor entry of Java version of Internationalization subproject into Jakarta
E) incorporate internationalization into another Apache/Jakarta sub/project (please specify)
F) none of the above

Robert Simpson

Santiago Gala wrote:

> Robert Simpson escribió:
> > Santiago Gala,
> >
> > As far a document and resource translation, I'm not sure if you are
> > referring to machine translation, or human translation.  My focus has
> > been on human translation, mainly because machine translation is
> > still pretty far from perfect.  There could be APIs for interfaces to
> > various machine translation tools, such as Systransoft, but I think
> > that should be a later, secondary priority.  Even if there was
> > support for machine translation, I would prefer that it could be
> > augmented by human proofreading and revision.  So it's probably just
> > as easy to let the language translator use whatever machine
> > translation tool s/he prefers.
> >
>
> David Taylor has already anwered WRT code.
>
> I was thinking mostly about having a "pool" of people who can translate
> and are more or less "cross project". For instance, I can translate
> English to Spanish, and I'm a committer in Jetspeed, but I could also
> translate, say, parts of the tomcat documents that I'm reading, or some
> XML stuff I'm interested into. Or even docs for Apache modules.
>
> The good part is that it would help the whole community, both WRT
> translation efforts and WRT crosspollination, as these kind of people
> will "see" beyond their small project(s). Also, it oculd bring new kinds
> of developers (Today I heard in the radio, coming home, that 72% od
> people in Spain cannot speak *any* foreign language. We are a bad sample
> but in most of Europe, less than 50% people speaks English.)
>
> The problem is that I can't see clearly how to implement such a
> crosscutting service/project, in ways that would not be difficult to
> impossible to manage. Specially since we should keep source control on
> both the original doc and the translations in sync.
>
> Any ideas?
>
> Regards
> --
> Santiago Gala
> High Sierra Technology, S.L. (http://hisitech.com)
> http://memojo.com?page=SantiagoGalaBlog

---------------------------------------------------------------------
To unsubscribe, e-mail: community-unsubscribe@apache.org
For additional commands, e-mail: community-help@apache.org


Mime
View raw message