lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Reverse stemmer?
Date Thu, 08 Oct 2009 19:20:10 GMT
Out of curiousity and perhaps for practical purposes, how does one
handle mixed language documents?  I suppose one could extract the
words of a particular language and place it in a lang specific field?
Are there libraries to perform this (yet)?

On Thu, Oct 8, 2009 at 6:32 AM, Christian Reuschling
<christian.reuschling@gmail.com> wrote:
> Hi,
>
> looking up the different terms with a common stem can be useful in different
> scenarios - so I don't want to judge it whether someone needs it or not.
>
> E.g., in the case you have multilingual documents in your index, it is straight
> forward to determine the language of the documents in order to choose the right
> stemmer. At least this is right for document with homogenous language.
>
> Althought this is true at indexing time, the language classification for the
> user query is not such trivial - and you have to do this in order to stem the
> query terms for searching. One possibility would be to search for the stems
> given from all stemmers - but in this case you will receive many wrong
> searching terms, thus much noise in the result lists.
>
> Another possibility can be to offer all 'potential synonyms' of the query terms
> to the user - where he can choose whether these are right or not. In this case
> you need exactly the lookup 'queryTerm->stem->terms with same stem'. This can
> be much more precise, the lacks are of course the interaction needed by the
> user and longer queries.
>
> To realize this, someone could write a specific Analyzer that stores this
> relationship additionally e.g. into a database. I personaly don't know any
> possibility to read this directly out of the Lucene index.
>
>
> In the case someone has best practices or an idea how processing multilingual
> indices can be done better, I would be appreciated to read / hear about this.
>
>
>
> all best
>
> Chris
>
>
> On Tue, 6 Oct 2009 16:31:36 +0900
> David Leangen <apache@leangen.net> wrote:
>
>>
>> Hello,
>>
>> I've been using Lucene in a very basic way for some time now, and I'm
>> starting to take advantage of some of the linguistic capabilities only
>> now.
>>
>> I am making use of the snowball analyzer for stemming, and it works
>> very well.
>>
>>
>> Question: is there any such thing as a "reverse stemmer"? In other
>> words, given the stem of a word, is there any algorithm to find the
>> original word? Or is this just fantasy? ;-)
>>
>> Now, I understand that there is a 1:n mapping of stems:words. I can
>> deal with that.
>>
>>
>> Thanks!
>> =David
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message