Return-Path: X-Original-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 046639E9A for ; Thu, 9 Feb 2012 15:46:14 +0000 (UTC) Received: (qmail 55955 invoked by uid 500); 9 Feb 2012 15:46:13 -0000 Delivered-To: apmail-incubator-ooo-dev-archive@incubator.apache.org Received: (qmail 55839 invoked by uid 500); 9 Feb 2012 15:46:13 -0000 Mailing-List: contact ooo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ooo-dev@incubator.apache.org Delivered-To: mailing list ooo-dev@incubator.apache.org Received: (qmail 55831 invoked by uid 99); 9 Feb 2012 15:46:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2012 15:46:12 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS,T_FRT_COCK X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dave2wave@comcast.net designates 76.96.27.227 as permitted sender) Received: from [76.96.27.227] (HELO qmta12.emeryville.ca.mail.comcast.net) (76.96.27.227) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2012 15:46:05 +0000 Received: from omta24.emeryville.ca.mail.comcast.net ([76.96.30.92]) by qmta12.emeryville.ca.mail.comcast.net with comcast id XqtG1i0051zF43QACrllLT; Thu, 09 Feb 2012 15:45:45 +0000 Received: from [192.168.1.74] ([67.180.51.144]) by omta24.emeryville.ca.mail.comcast.net with comcast id Xrlk1i00836gVt78krlkK6; Thu, 09 Feb 2012 15:45:44 +0000 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Pootle Data From: Dave Fisher In-Reply-To: <4F33E1C5.3010209@a-w-f.de> Date: Thu, 9 Feb 2012 07:45:43 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4F1D2DF1.7020908@a-w-f.de> <4F312EE8.3080402@a-w-f.de> <4F32A371.5090202@gmail.com> <4F32A79B.9010901@a-w-f.de> <4F33E1C5.3010209@a-w-f.de> To: ooo-dev@incubator.apache.org X-Mailer: Apple Mail (2.1084) Hi Andre, On Feb 9, 2012, at 7:09 AM, Andre Fischer wrote: > On 09.02.2012 15:23, Huaidong Qiu wrote: >> Where can we get the data? I can help to check and understand the = toolset >> and process. >=20 > That sounds great. Please have a look at >=20 > http://people.apache.org/~af/index.html It would be a good idea to add an INFRA issue to JIRA to track loading = of this data to the Apache Pootle Server. Regards, Dave >=20 > -Andre >=20 >>=20 >> On Thu, Feb 9, 2012 at 1:04 AM, Louis Su=E1rez-Potts >> wrote: >>=20 >>> Hi >>>=20 >>> On 8 February 2012 11:49, Andre Fischer wrote: >>>> On 08.02.2012 17:31, Stuart Swales wrote: >>>>>=20 >>>>> On 07/02/2012 14:02, Andre Fischer wrote: >>>>>>=20 >>>>>> Hi, >>>>>>=20 >>>>>> I recently had a little time to look at the pootle data. Here is = what I >>>>>> have found out so far. Please keep in mind that this is new for = me and >>>>>> that my interpretations may be wrong. >>>>>>=20 >>>>>> For context I will start with a short description of the = directory >>>>>> structure of the 80 GB of the backup disk: >>>>>>=20 >>>>>> In the top-level podirectory/ there is a sub-directory = openoffice_org/ >>>>>> that probably is the translation data of OpenOffice.org. It = contains >>>>>> sub-directories for most languages (more on the exact set below.) >>>>>> The content of podirectory is available at [1]. >>>>>>=20 >>>>>> Below the top-level backup/ there are two directories DEV_m103/ = and >>>>>> DEV_94/ for two milestones. Below these you can find directories = like >>>>>> backconvert-110326/ that probably contain backups for certain = dates >>>>>> (March 26 2011 in this example. The most recent is >>>>>> DEV_m103/backconvert-110401 from April 1st of last year. >>>>>>=20 >>>>>> After comparing time stamps I now think that we can disregard the = whole >>>>>> backup/ directory. There are .po files under podirectory/ that = are from >>>>>> later then April 1st. Some files are from May. >>>>>>=20 >>>>>> I then tried to find out whether the pootle data are older or = newer >>> than >>>>>> the data in the extras/l10n module in our SVN repository. The >>> timestamps >>>>>> in the .sdf files are useless, our tools set them all to = 2002-02-02. >>> The >>>>>> file time stamps can not be used directly because of the = differing >>>>>> directory structures. >>>>>>=20 >>>>>> Comparing the set of lanuages of the pootle server and that in >>>>>> extras/l10n/ was also inconclusive: >>>>>> The set of languages that are present in both data sets is >>>>>> af ar as ast bg bn bo bs ca cs cy da dz es et fa fr fur ga gd gl = gu he >>>>>> hi hu id is it ja jbo ka kab kn ko ku lt lv ml mr my nb nl nn nr = nso ny >>>>>> oc om or pap pl ps pt ru sc si sk so sq ss st sv ta te th tn tr = ts ug >>>>>> uk uz ve vi xh zu >>>>>>=20 >>>>>> Languages only in extras/l10n/ are: >>>>>> be-BY br brx de dgo el eo eu fi hr kid kk km kok ks ky mai mk mn = mni ne >>>>>> pa-IN ro rw sa-IN sat sd sh sl sr sw-TZ tg >>>>>>=20 >>>>>> Languages only on the pootle server are: >>>>>> pyg son tk tlh >>>>>>=20 >>>>>> See [2] for a list of language ids. (tlh for example is klingon) >>>>>>=20 >>>>>>=20 >>>>>> So, we probably have to merge both data sets and hope for the = best. >>>>>> Any information from people who know the localization process = better is >>>>>> welcome. >>>>>>=20 >>>>>>=20 >>>>>> Regards, >>>>>> Andre >>>>>>=20 >>>>>>=20 >>>>>> [1] http://people.apache.org/~af/index.html >>>>>> [2] http://www.loc.gov/standards/iso639-2/php/code_list.php >>>>>=20 >>>>>=20 >>>>>=20 >>>>> And what has happened to en-GB and en-ZA ? >>>>=20 >>>>=20 >>>> Ah, at least one person who reads my mails :-) >>>>=20 >>>> I forgot to add the following languages as being present in both >>> locations: >>>> ca-XV en-GB en-ZA pt-BR zh-CN zh-TW >>>>=20 >>>> Reason: These six language ids are written slightly differently on = the >>>> pootle server (with a '_' (underline) in the middle) and in l10n/ = (with a >>>> '-' (dash)). I sorted them differently and then forgot about them. >>> Sorry. >>>=20 >>> Thanks. And I too actually read your mail messages :-)--and deep >>> appreciate the work. >>>=20 >>> ciao >>> louis >>>>=20 >>>> -Andre >>>=20 >>=20