lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From liat oren <oren.l...@gmail.com>
Subject Re: scoring adjacent terms without proximity search
Date Mon, 23 Nov 2009 11:55:41 GMT
Hi Joel,

I encounter the same problem.
Could you please elaborate a bit on this?

Many thanks,
Liat

2009/11/2 Joel Halbert <joel@su3analytics.com>

> I opted to use the following query to solve this problem, since it meets
> my requirements, for the time being.
>
> +(cheese sandwich) "cheese sandwich"~slop
>
> This includes documents with one of more of the terms, but prefers those
> with an edit distance <= the slop.
>
>
> -----Original Message-----
> From: Joel Halbert <joel@su3analytics.com>
> Reply-To: java-user@lucene.apache.org
> To: java-user@lucene.apache.org
> Subject: Re: scoring adjacent terms without proximity search
>  Date: Sat, 31 Oct 2009 08:38:29 +0000
>
> Thank you all for your suggestions, I shall have a little think about
> the best way forward, and report back if I do anything interesting that
> works well.
>
> In answer to Grant's question, why not use PhraseQuery,  we do not want
> to have an artificial upper limit on the slop, i.e. we do want to
> include documents that might only have a subset of words from the
> phrase. (e.g. just cheese, or just sandwich, but not both).
>
>
> -----Original Message-----
> From: Robert Muir <rcmuir@gmail.com>
> Reply-To: java-user@lucene.apache.org
> To: java-user@lucene.apache.org
> Subject: Re: scoring adjacent terms without proximity search
> Date: Fri, 30 Oct 2009 16:04:03 -0400
>
> > I suppose you could precompute the proximity associations by indexing
> > n-grams (in this case, called Lucene calls them shingles), such that
> there
> > is a single token in your index containing cheese_sandwich (effectively)
> >
> >
> doh, I see Grant already lead you in this direction. (sorry for the
> duplicate mail)
> on average its worked for me for some things like this.
>
> although, I'll try to contribute something actually useful, and mention
> that
> if you use things like shingles, its good to consider modifying
> DefaultSimilarity, look at setDiscountOverlaps param.
> otherwise, i've measured cases where injecting additional tokens will cause
> more harm than good, because it has an adverse affect on lengthnorm.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message