incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Fisher <dave2w...@comcast.net>
Subject Re: Pootle Data
Date Thu, 09 Feb 2012 16:24:38 GMT

On Feb 9, 2012, at 8:04 AM, Andre Fischer wrote:

> Hi Dave,
> 
> On 09.02.2012 16:45, Dave Fisher wrote:
>> Hi Andre,
>> 
>> On Feb 9, 2012, at 7:09 AM, Andre Fischer wrote:
>> 
>>> On 09.02.2012 15:23, Huaidong Qiu wrote:
>>>> Where can we get the data? I can help to check and understand the toolset
>>>> and process.
>>> 
>>> That sounds great.  Please have a look at
>>> 
>>> http://people.apache.org/~af/index.html
>> 
>> It would be a good idea to add an INFRA issue to JIRA to track loading of this data
to the Apache Pootle Server.
> 
> I don't know if we are ready for this yet.  First we have to make sure which data set
to use.
> 
> But you are right, eventually we have to think about how to upload data to the pootle
server.  And when. I mean, once we after the initial upload of the whole data set, how do
we handle new strings introduced by eg new UI elements.  One option would be to have the build
server perform an additional step: determine the new strings and upload them.
> Would that technically be possible, or do we have to manually open an INFRA issue for
each upload?

These details will need to be negotiated with Infrastructure. I believe that the project will
be granted the appropriate karma to maintain the project's data.

I'm ignorant of the details and my purpose is to make sure that AOO is moving forward with
respect to Pootle. When you are ready to discuss details then you should email infrastructure@a.o
. It is a private ML so please avoid cross-posting.

Thanks for your leadership in this area.

Thanks and Regards,
Dave


> 
> Regards,
> Andre
> 
>> 
>> Regards,
>> Dave
>> 
>> 
>>> 
>>> -Andre
>>> 
>>>> 
>>>> On Thu, Feb 9, 2012 at 1:04 AM, Louis Suárez-Potts
>>>> <lsuarezpotts@gmail.com>wrote:
>>>> 
>>>>> Hi
>>>>> 
>>>>> On 8 February 2012 11:49, Andre Fischer<af@a-w-f.de>   wrote:
>>>>>> On 08.02.2012 17:31, Stuart Swales wrote:
>>>>>>> 
>>>>>>> On 07/02/2012 14:02, Andre Fischer wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I recently had a little time to look at the pootle data.
Here is what I
>>>>>>>> have found out so far. Please keep in mind that this is new
for me and
>>>>>>>> that my interpretations may be wrong.
>>>>>>>> 
>>>>>>>> For context I will start with a short description of the
directory
>>>>>>>> structure of the 80 GB of the backup disk:
>>>>>>>> 
>>>>>>>> In the top-level podirectory/ there is a sub-directory openoffice_org/
>>>>>>>> that probably is the translation data of OpenOffice.org.
It contains
>>>>>>>> sub-directories for most languages (more on the exact set
below.)
>>>>>>>> The content of podirectory is available at [1].
>>>>>>>> 
>>>>>>>> Below the top-level backup/ there are two directories DEV_m103/
and
>>>>>>>> DEV_94/ for two milestones. Below these you can find directories
like
>>>>>>>> backconvert-110326/ that probably contain backups for certain
dates
>>>>>>>> (March 26 2011 in this example. The most recent is
>>>>>>>> DEV_m103/backconvert-110401 from April 1st of last year.
>>>>>>>> 
>>>>>>>> After comparing time stamps I now think that we can disregard
the whole
>>>>>>>> backup/ directory. There are .po files under podirectory/
that are from
>>>>>>>> later then April 1st. Some files are from May.
>>>>>>>> 
>>>>>>>> I then tried to find out whether the pootle data are older
or newer
>>>>> than
>>>>>>>> the data in the extras/l10n module in our SVN repository.
The
>>>>> timestamps
>>>>>>>> in the .sdf files are useless, our tools set them all to
2002-02-02.
>>>>> The
>>>>>>>> file time stamps can not be used directly because of the
differing
>>>>>>>> directory structures.
>>>>>>>> 
>>>>>>>> Comparing the set of lanuages of the pootle server and that
in
>>>>>>>> extras/l10n/ was also inconclusive:
>>>>>>>> The set of languages that are present in both data sets is
>>>>>>>> af ar as ast bg bn bo bs ca cs cy da dz es et fa fr fur ga
gd gl gu he
>>>>>>>> hi hu id is it ja jbo ka kab kn ko ku lt lv ml mr my nb nl
nn nr nso ny
>>>>>>>> oc om or pap pl ps pt ru sc si sk so sq ss st sv ta te th
tn tr ts ug
>>>>>>>> uk uz ve vi xh zu
>>>>>>>> 
>>>>>>>> Languages only in extras/l10n/ are:
>>>>>>>> be-BY br brx de dgo el eo eu fi hr kid kk km kok ks ky mai
mk mn mni ne
>>>>>>>> pa-IN ro rw sa-IN sat sd sh sl sr sw-TZ tg
>>>>>>>> 
>>>>>>>> Languages only on the pootle server are:
>>>>>>>> pyg son tk tlh
>>>>>>>> 
>>>>>>>> See [2] for a list of language ids. (tlh for example is klingon)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So, we probably have to merge both data sets and hope for
the best.
>>>>>>>> Any information from people who know the localization process
better is
>>>>>>>> welcome.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Andre
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [1] http://people.apache.org/~af/index.html
>>>>>>>> [2] http://www.loc.gov/standards/iso639-2/php/code_list.php
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> And what has happened to en-GB and en-ZA ?
>>>>>> 
>>>>>> 
>>>>>> Ah, at least one person who reads my mails :-)
>>>>>> 
>>>>>> I forgot to add the following languages as being present in both
>>>>> locations:
>>>>>>    ca-XV en-GB en-ZA pt-BR zh-CN zh-TW
>>>>>> 
>>>>>> Reason: These six language ids are written slightly differently on
the
>>>>>> pootle server (with a '_' (underline) in the middle) and in l10n/
(with a
>>>>>> '-' (dash)).  I sorted them differently and then forgot about them.
>>>>> Sorry.
>>>>> 
>>>>> Thanks. And I too actually read your mail messages :-)--and deep
>>>>> appreciate the work.
>>>>> 
>>>>> ciao
>>>>> louis
>>>>>> 
>>>>>> -Andre
>>>>> 
>>>> 
>> 


Mime
View raw message