lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Query Term Questions
Date Wed, 21 Jan 2004 15:30:57 GMT
On Jan 21, 2004, at 10:01 AM, Terry Steichen wrote:
>> But doesn't the query itself take this into account?  If there are
>> multiple matching terms then the overlap (coord) factor kicks in.
> TS==>Except that I'd like to be able to choose to do this on a
> query-by-query basis.  In other words,
> it's desirable that some specific queries significantly increase their
> discrimination based on this multiple matching,
> relative to the normal extra boost given by the coord factor.  
> However, I
> take it from your answer that
> there's not a way to do this in the query itself (at least using the
> unmodified, standard Lucene version).

Don't interpret my replies as being absolute here - I'm still learning 
lots about Lucene and am open to being shown new ways of doing things 
with it.

>> Another reply mentioned negative boosting.  Is that not working as
>> you'd like?
> TS==>I've not been able to get negative boosting to work at all.  Maybe
> there's a problem with my syntax.
> If, for example, I do a search with "green beret"^10, it works just 
> fine.
> But "green beret"^-2 gives me a
> ParseException showing a lexical error.

Have you tried it without using QueryParser and boosting a Query using 
setBoost on it?  QueryParser is a double-edged sword and it looks like 
it only allows numeric characters (plus "." followed by numeric 
characters).  So QueryParser has the problem with negative boosts, but 
not Query itself.

>> Sounds like what you're really after is fancier analysis.  This is one
>> of the purposes of analysis, to do stemming.
> TS==>Well, I hope I'm not trying to be fancy.  It's just that listing 
> all
> the different variants, particularly (as in my
> case) I have to do this for multiple fields, gets tedious and 
> error-prone.
> The example above is simply one such case
> for a particular query - other queries may have entirely different 
> desired
> combinations.  Constructing a single stemmer
> to handle all such cases would be (for me, at least) very difficult.
> Besides, I tend to stay away from stemming because
> I believe it can introduce some rather unpredictable side-effects.

I'd still recommend trying some of the other analyzer options out there 
and seeing if you can tweak things to your liking.  This is really the 
answer for what you are after, I'm almost certain.  Good stemmers exist 
- look at the Porter one or the Snowball ones.  Write some test cases 
to "analyze the analyzer" like I did in my articles - it 
really will let you experiment with indexing and searching easily.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message