lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Halbert <j...@su3analytics.com>
Subject Re: scoring adjacent terms without proximity search
Date Mon, 02 Nov 2009 08:20:26 GMT
I opted to use the following query to solve this problem, since it meets
my requirements, for the time being.

+(cheese sandwich) "cheese sandwich"~slop

This includes documents with one of more of the terms, but prefers those
with an edit distance <= the slop.


-----Original Message-----
From: Joel Halbert <joel@su3analytics.com>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: scoring adjacent terms without proximity search
Date: Sat, 31 Oct 2009 08:38:29 +0000

Thank you all for your suggestions, I shall have a little think about
the best way forward, and report back if I do anything interesting that
works well.

In answer to Grant's question, why not use PhraseQuery,  we do not want
to have an artificial upper limit on the slop, i.e. we do want to
include documents that might only have a subset of words from the
phrase. (e.g. just cheese, or just sandwich, but not both).


-----Original Message-----
From: Robert Muir <rcmuir@gmail.com>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: scoring adjacent terms without proximity search
Date: Fri, 30 Oct 2009 16:04:03 -0400

> I suppose you could precompute the proximity associations by indexing
> n-grams (in this case, called Lucene calls them shingles), such that there
> is a single token in your index containing cheese_sandwich (effectively)
>
>
doh, I see Grant already lead you in this direction. (sorry for the
duplicate mail)
on average its worked for me for some things like this.

although, I'll try to contribute something actually useful, and mention that
if you use things like shingles, its good to consider modifying
DefaultSimilarity, look at setDiscountOverlaps param.
otherwise, i've measured cases where injecting additional tokens will cause
more harm than good, because it has an adverse affect on lengthnorm.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message