Return-Path: X-Original-To: apmail-openoffice-dev-archive@www.apache.org Delivered-To: apmail-openoffice-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 21A68F8C8 for ; Mon, 6 May 2013 23:23:13 +0000 (UTC) Received: (qmail 1883 invoked by uid 500); 6 May 2013 23:23:12 -0000 Delivered-To: apmail-openoffice-dev-archive@openoffice.apache.org Received: (qmail 1821 invoked by uid 500); 6 May 2013 23:23:12 -0000 Mailing-List: contact dev-help@openoffice.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@openoffice.apache.org Delivered-To: mailing list dev@openoffice.apache.org Received: (qmail 1807 invoked by uid 99); 6 May 2013 23:23:12 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 23:23:12 +0000 Received: from localhost (HELO mail-pa0-f52.google.com) (127.0.0.1) (smtp-auth username robweir, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 23:23:12 +0000 Received: by mail-pa0-f52.google.com with SMTP id bg2so57305pad.11 for ; Mon, 06 May 2013 16:23:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=xBIrxQDsYcWxoBRsauHEvpzrjO+kEN6LdxtuOZmu37g=; b=fLLbLZRnwRht1PQuIL2QjM3T/5fjWoqTCyGfMgh8AsibgfTpN0ZgMxo7n2F3tLA+r7 lcbhJhWj42J/CUhPAB6T9zR8AUJxFbcuohG9Y2v6XD7hgwzN38ZYrn0pnSLYt2z01WXr NRJHCBSbvXgQPkgdAd/Jbv+/C3IsRSFXjPsov3D/DLqovlMe6z3Xo09QhOBMBJuY0CW0 tma/K6aQvxx3GdFWKdaiPrQ8NIGr0+mQV2BNKercfDxoSIOWFMjZz0G1RLCSu1KFsIC2 o/AfHPcfSM1ynVUny8ssdA6aR8/GXUryCC4bhApqsFUK1Fnw9YCP8gyKBofdYj+LanLb 86/g== MIME-Version: 1.0 X-Received: by 10.68.102.66 with SMTP id fm2mr28283700pbb.168.1367882591960; Mon, 06 May 2013 16:23:11 -0700 (PDT) Received: by 10.70.78.229 with HTTP; Mon, 6 May 2013 16:23:11 -0700 (PDT) In-Reply-To: <5186C4ED.5090001@mail.telepac.pt> References: <5186C4ED.5090001@mail.telepac.pt> Date: Mon, 6 May 2013 19:23:11 -0400 Message-ID: Subject: Re: Proofing Tool GUI From: Rob Weir To: "dev@openoffice.apache.org" , marcoagpinto@mail.telepac.pt Content-Type: multipart/related; boundary=047d7b675deafd067804dc14fb97 --047d7b675deafd067804dc14fb97 Content-Type: multipart/alternative; boundary=047d7b675deafd067504dc14fb96 --047d7b675deafd067504dc14fb96 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sun, May 5, 2013 at 4:45 PM, Marco A.G.Pinto < marcoagpinto@mail.telepac.pt> wrote: > Hello my dear ones, > > A couple of days ago I was on IRC in #dev.openoffice.org chatting with > JZA. > > I came up with the idea of creating a GUI to edit the thesaurus of AOO. > > JZA told me the files were in TXT format and gave me a URL with several > information but I gave a quick look and didn't find anything about the da= ta > dictionary of the thesaurus. > > The tool will be called "Proofing Tool GUI" and will be coded in > PureBasic. Is this a good name? PureBasic allows to compile in > Windows/Linux/Mac/Amiga. > > The reason why I want to code it is because months ago I contacted my > friends at Minho University in Portugal who are in charge of PT-pt and I > wanted to send them words to be used as synonymous but they didn't know h= ow > to add them. > > This makes me wonder... Does it still make sense, in the year 2013, for updates to dictionaries and thesauruses to require a download and install of a large file. Is there a way to do this incrementally, even live, based on a feed (RSS or Atom)? So I could have AOO "subscribe" to a dictionary and receive new words as they become popular. Maybe there can even be the ability to have a custom subscription that is used only within a company, to publish special words used there, technical, product names, etc. You could even have a menu option as part of spell checking "Add to shared dictionary...". -Rob > This made me think that there isn't a tool for doing that, so my idea is > good because it can be used by the whole community of developers. > > I unziped the Portuguese .OXT and grabbed the files: > - th_pt_PT.idx > - th_pt_PT.dat > > I opened them with Microsoft Expression Web 4 to keep the UTF-8 format bu= t > didn't understand completely how they work. > > For example, in the *.idx* one I had: > UTF-8 > 12940 > 1|6 > a cerca de|16097 > a come=C3=A7ar de|19986 > a favor|32934 > a partir de|67469 > a respeito de|77248 > ... etc... > > > in the *.dat* one I had: > UTF-8 > 1|3 > -|anuviado > -|aperitivo > -|sigla > ababelado|1 > -|atrapalhado|baralhado|atarantado|desnorteado > ababelar|1 > -|baralhar|atrapalhar > aba=C3=A7anado|1 > ... etc... > > It seems there are at least three levels of synonymous in the *.dat* one > but I don't know how to interpret them if I create a GUI. > > Also, in the *.idx* one there are numbers too which I don't understand > the meaning. > > Is there a URL which explains every detail of those files? > > Thanks! > > Kind regards from, > >Marco A.G.Pinto > ----------------------- > > > > -- > --047d7b675deafd067504dc14fb96 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable



On Sun, May 5, 2013 at 4:45 PM, Marco A.G.Pinto &= lt;marcoa= gpinto@mail.telepac.pt> wrote:
=20 =20 =20
Hello my dear ones,

A couple of days ago I was on IRC in #dev.openoffice.org chatting with JZA.

I came up with the idea of creating a GUI to edit the thesaurus of AOO.

JZA told me the files were in TXT format and gave me a URL with several information but I gave a quick look and didn't find anythin= g about the data dictionary of the thesaurus.

The tool will be called "Proofing Tool GUI" and will be coded= in PureBasic. Is this a good name? PureBasic allows to compile in Windows/Linux/Mac/Amiga.

The reason why I want to code it is because months ago I contacted my friends at Minho University in Portugal who are in charge of PT-pt and I wanted to send them words to be used as synonymous but they didn't know how to add them.



This makes me wonder...= =C2=A0=C2=A0 Does it still make sense, in the year 2013, for updates to dic= tionaries and thesauruses to require a download and install of a large file= .=C2=A0 Is there a way to do this incrementally, even live, based on a feed= (RSS or Atom)?=C2=A0=C2=A0 So I could have AOO "subscribe" to a = dictionary and receive new words as they become popular.=C2=A0 Maybe there = can even be the ability to have a custom subscription that is used only wit= hin a company, to publish special words used there, technical, product name= s, etc.=C2=A0 You could even have a menu option as part of spell checking &= quot;Add to shared dictionary...".

-Rob


=C2=A0
This made me think that there isn't a tool for doing that, so my idea is good because it can be used by the whole community of developers.

I unziped the Portuguese .OXT and grabbed the files:
- th_pt_PT.idx
- th_pt_PT.dat

I opened them with Microsoft Expression Web 4 to keep the UTF-8 format but didn't understand completely how they work.

For example, in the .idx one I had:
UTF-8
12940
1|6
a cerca de|16097
a come=C3=A7ar de|19986
a favor|32934
a partir de|67469
a respeito de|77248
=C2=A0=C2=A0 ... etc...


in the .dat one I had:
UTF-8
1|3
-|anuviado
-|aperitivo
-|sigla
ababelado|1
-|atrapalhado|baralhado|atarantado|desnorteado
ababelar|1
-|baralhar|atrapalhar
aba=C3=A7anado|1
=C2=A0=C2=A0 ... etc...

It seems there are at least three levels of synonymous in the .dat one but I don't know how to interpret them if I create a GUI.

Also, in the .idx one there are numbers too which I don't understand the meaning.

Is there a URL which explains every detail of those files?

Thanks!

Kind regards from,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 >Marco A.G.Pinto =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ----------= -------------



--

--047d7b675deafd067504dc14fb96-- --047d7b675deafd067804dc14fb97--