lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Lopuszynski <lop...@gmail.com>
Subject Re: Hunspell low level interface in Lucene 4.8
Date Mon, 16 Jun 2014 08:16:39 GMT
Hi Robert,

thank you for your answer!

Hmmm... I need a plain stemmer, i.e. a functionality taking a word and
returning a list of stems.
Wrapping every word in tokenstream, which does a lot of things I do
not need, seems like an overkill and waste of resources...

Is there any problem with keeping the access to plain Hunspell stemmer
available, as it was in 4.7?
Lucene exposes many other stemmers. Why keep the very useful Hunspell
functionality hidden, especially that it was greatly optimized
(feature I was missing a lot!)
It seems to me that making the Hunspell.stemmer class public would do
the trick...

I realize that Hunspell API may change in future releases, but that is
fine and normal, as we are speaking of very low level functionality.

Thanks for having a look at this.

Best regards,
Michał



On Sun, Jun 15, 2014 at 2:20 PM, Robert Muir <rcmuir@gmail.com> wrote:
> Can you just use the tokenstream api? Thats the one we maintain and support...
>
> On Sat, Jun 14, 2014 at 10:42 AM, Michal Lopuszynski <lopusz@gmail.com> wrote:
>> Dear all,
>>
>> I am not much into searching, however, I used Lucene to do some text
>> postprocessing, (esp. stemming) using low  level tools generously
>> gathered in Lucene.
>>
>> I was very happy to see the memory footprint improvement in the
>> Hunspell stemmer algorithm
>> (https://issues.apache.org/jira/browse/LUCENE-5468) in 4.8.*
>>
>> However, I also found out that low level interface (basically a class
>> providing method taking String and returning list of stems) provided
>> in the class HunspellStemmer in Lucene 4.7.2 somehow disappeared in
>> 4.8.*:
>>
>> http://lucene.apache.org/core/4_7_2/analyzers-common/org/apache/lucene/analysis/hunspell/HunspellStemmer.html
>>
>> After a brief analysis of sources, I found out that the functionality I am after
>> is provided in the class org.apache.lucene.analysis.hunspell.Stemmer,
>> which is private (!!!) in 4.8.* :(
>>
>> Is it possible to make it public in the forthcoming release?
>> Or maybe I am missing something and there is other way to access such
>> low-level functionality?
>>
>> It would help me a lot! Thank you for any of your hints on the issue!
>>
>> Kind regards,
>> Michał
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message