openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marco A.G.Pinto" <marcoagpi...@mail.telepac.pt>
Subject Hunspell unmunching question
Date Thu, 04 Dec 2014 15:35:28 GMT
Hello!

Around a week ago, Peter from England sent me an e-mail suggesting new 
words to be added to en_GB.

One of them was "unsubscribe".

Here is what appears in Proofing Tool GUI:


The strange thing is that I tried the variants in Mozilla and OpenOffice 
and none of them was marked as a typo.

I started meditating about it and wondered if in Hunspell the prefixes 
would attach themselves to all suffixes.

Today I made a test, please see the archive: 
https://dl.dropboxusercontent.com/u/30674540/hunspell_issue_marcoagpinto_20141204.zip
It contains the extracted wordlists both in PTG and Unmunch and also the 
.DIC + .AFF I created for the tests.

In my PTG 3.0 build 67 I get:
*subscribe**
**resubscribe**
**subscribing**
**oversubscribe**
**subscribes**
**subscribed**
**unsubscribe**
**000**
**subscribe**
**unsubscribe**
**resubscribe**
**subscribing**
**oversubscribe**
**subscribes**
**subscribed**
**
*In Unmunch for Linux I got:
*subscribe**
**subscribing**
**subscribed**
**subscribes**
**resubscribing**
**oversubscribing**
**unsubscribing**
**resubscribed**
**oversubscribed**
**unsubscribed**
**resubscribes**
**oversubscribes**
**unsubscribes**
**resubscribe**
**oversubscribe**
**unsubscribe**
**000**
**subscribe**
**subscribing**
**subscribed**
**subscribes**
**resubscribing**
**oversubscribing**
**unsubscribing**
**resubscribed**
**oversubscribed**
**unsubscribed**
**resubscribes**
**oversubscribes**
**unsubscribes**
**resubscribe**
**oversubscribe**
**unsubscribe**
**
*I placed a "000" to divide the same word with an exchanged order of the 
code "U" to make sure it would produce the same results, no matter its 
position.

What this means is that I probably need to change the code of my tool, 
maybe create three arrays:
1st - to store the words with suffixes
2nd - to store the codes of the prefixes
3rd - to store 1st plus all its combinations with the prefixes (it would 
apply prefixes to 1st and store them in 3rd )

Then, I would display the prefixes at the bottom in PTG not following 
the order of the codes?

What this also means is that there are hundreds of combinations not 
appearing in the wordlist which I always publish in .txt in the GitHub 
of the project but that are processed by Hunspell in Mozilla (Firefox, 
Thunderbird and SeaMonkey) and Apache OpenOffice.

Thanks for your time!

Kind regards,
       >Marco A.G.Pinto
         ----------------------


-- 

Mime
View raw message