lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nuno Seco <ns...@dei.uc.pt>
Subject Re: Reverse stemmer?
Date Thu, 08 Oct 2009 13:36:42 GMT
Hi.

You may want to take a look at:
http://wordlist.sourceforge.net/


--
Nuno Seco

Christian Reuschling wrote:
> Hi,
>
> looking up the different terms with a common stem can be useful in different
> scenarios - so I don't want to judge it whether someone needs it or not.
>
> E.g., in the case you have multilingual documents in your index, it is straight
> forward to determine the language of the documents in order to choose the right
> stemmer. At least this is right for document with homogenous language.
>
> Althought this is true at indexing time, the language classification for the
> user query is not such trivial - and you have to do this in order to stem the
> query terms for searching. One possibility would be to search for the stems
> given from all stemmers - but in this case you will receive many wrong
> searching terms, thus much noise in the result lists.
>
> Another possibility can be to offer all 'potential synonyms' of the query terms
> to the user - where he can choose whether these are right or not. In this case
> you need exactly the lookup 'queryTerm->stem->terms with same stem'. This can
> be much more precise, the lacks are of course the interaction needed by the
> user and longer queries.
>
> To realize this, someone could write a specific Analyzer that stores this
> relationship additionally e.g. into a database. I personaly don't know any
> possibility to read this directly out of the Lucene index.
>
>
> In the case someone has best practices or an idea how processing multilingual
> indices can be done better, I would be appreciated to read / hear about this.
>
>
>
> all best
>
> Chris
>
>
> On Tue, 6 Oct 2009 16:31:36 +0900
> David Leangen <apache@leangen.net> wrote:
>
>   
>> Hello,
>>
>> I've been using Lucene in a very basic way for some time now, and I'm  
>> starting to take advantage of some of the linguistic capabilities only  
>> now.
>>
>> I am making use of the snowball analyzer for stemming, and it works  
>> very well.
>>
>>
>> Question: is there any such thing as a "reverse stemmer"? In other  
>> words, given the stem of a word, is there any algorithm to find the  
>> original word? Or is this just fantasy? ;-)
>>
>> Now, I understand that there is a 1:n mapping of stems:words. I can  
>> deal with that.
>>
>>
>> Thanks!
>> =David
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message