lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hall <mh...@informatics.jax.org>
Subject Re: Problem with search an exact word and stemming
Date Fri, 27 Jun 2008 13:35:21 GMT
Also, please note that I thought about it and realized that I mispoke 
when I sent out my original suggestion.  You don't want an untokenized 
field in your case, you want an unstemmed one instead.

This will allow you to get the functionality you are looking for.. at 
least I believe so ^^

Anyhow, best of luck!

Matt

renou oki wrote:
>  Thanks for the reply.
>
> I will try to add an other data field.
> I thought about this solution but i was not very sure. I thought that was an
> easier solution to do that...
>
>
> best regards
> Renou
>
> 2008/6/26 Matthew Hall <mhall@informatics.jax.org>:
>
>   
>> You could also add another data field to the index, with an untokenized
>> version of your data, and then use a multifield query to go against both the
>> stemmed and exact match parts of your search at the same time.
>>
>> This is a technique I've used quite often on my project with various
>> different requirements for the second field.  Mind you it makes the indexes
>> bigger, but unless your dataset is large its not really a huge problem.
>>
>> Matt
>>
>> Erick Erickson wrote:
>>
>>     
>>> The way I've solved this is to index the stemmed *and* a special
>>> token at the same position (see Synonym Analyzer). The From your
>>> example, say you're indexing progresser. You'd go ahead and index the
>>> stemmed version , "progress", AND you'd also index "progresser$"
>>> at the same offset. Now, when you want exact matches, search for
>>> the token with the $ at the end.
>>>
>>> This does make your index a bit larger, but not as much as you'd expect.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Jun 25, 2008 at 4:21 AM, renou oki <occhipin@gmail.com> wrote:
>>>
>>>
>>>
>>>       
>>>> Hello,
>>>>
>>>> I have a stemmed index, but i want to search the exact form of a word.
>>>> I use French Analyzer, so for instance "progression", "progresser" are
>>>> indexed with the linguistic root "progress".
>>>> But if I want to search the word "progress" (and only this word), I have
>>>> to
>>>> many hits (because of "progression", "progresser"...)
>>>> The field is indexed, tokenized and no store...
>>>>
>>>> Is there a way to do this, I mean to search an exact word in a stemmed
>>>> index
>>>> ?
>>>> I suppose that I have to use the same analyzer for indexing and
>>>> searching.
>>>>
>>>>
>>>> I try with a PhraseQuery, with quotes...
>>>>
>>>> Ps : I use lucene 1.9.1
>>>>
>>>> Thanks
>>>> Renald
>>>>
>>>>
>>>>
>>>>         
>>>
>>>       
>> --
>> Matthew Hall
>> Software Engineer
>> Mouse Genome Informatics
>> mhall@informatics.jax.org
>> (207) 288-6012
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   

-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message