lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: scoring adjacent terms without proximity search
Date Fri, 30 Oct 2009 20:04:03 GMT
> I suppose you could precompute the proximity associations by indexing
> n-grams (in this case, called Lucene calls them shingles), such that there
> is a single token in your index containing cheese_sandwich (effectively)
doh, I see Grant already lead you in this direction. (sorry for the
duplicate mail)
on average its worked for me for some things like this.

although, I'll try to contribute something actually useful, and mention that
if you use things like shingles, its good to consider modifying
DefaultSimilarity, look at setDiscountOverlaps param.
otherwise, i've measured cases where injecting additional tokens will cause
more harm than good, because it has an adverse affect on lengthnorm.

Robert Muir

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message