lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Steichen <te...@net-frame.com>
Subject Re: Lucene 1.9 RC1 release available
Date Wed, 22 Feb 2006 00:37:50 GMT
Marvin,

While a stemming analyzer can work well for general purpose queries, if 
you're seeking a decent level of precision/recall, stemming often 
severely limits you.  Moreover, unless the user is very familiar with 
the behavior of the stemmer used, some of the returned results can be 
quite surprising.  The logic of stemmers will, as you suggest, can 
eliminate some false positives, it will at the same time introduce new 
onees and false negatives as well.

I think the key is that, even if you have imprecise query demands that 
can be met by stemming, why limit Lucene's capability to achieve high 
levels of precision?  Especially when the alternative (in terms of the 
cat? behavior) provides a capability (matching a specific number of 
characters) that very few application apparently need?

Terry

Marvin Humphrey wrote:

> Terry,
>
> Is there a reason you wouldn't use a stemming analyzer of some kind,  
> which would match cat and cats but not cater, catches, etc?
>
> http://snowball.tartarus.org/demo.php
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
> On Feb 21, 2006, at 3:13 PM, Terry Steichen wrote:
>
>> No, I don't think that the riot* option would work for many  
>> queries.  Let's take a simple case where you want a singular or  
>> plural form, like either cat or cats (which would be very common).   
>> With 1.4.x, you can use cat? to retrieve such matches.  With the  new 
>> change, you need to use (cat cats) or (cat cat?).  If you use  cat*, 
>> you'll get a million matches you don't want (cater, catches,  
>> catwoman, category, catatonic, cataclysm, catamount, etc.).  Or,  
>> take a case where you want to retrieve terms like elder, elderly,  
>> elders but do not want things like elderberry, elderdice.  Or you  
>> want gun or guns, but not gunmen, gunshots, gunfire, gunpoint,  
>> gunston, etc.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message