lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: matching sub phrases in user entered query...
Date Mon, 14 Jul 2008 18:03:29 GMT
Hi Preetam,

On 07/14/2008 at 1:40 PM, Preetam Rao wrote:
> Is there a query in Lucene which matches sub phrases ?
> 
[snip]
> 
> I was redirected to Shingle filter which is a token filter
> that spits out n-grams. But it does not seem to be best solution
> since one does not know in advance what n in n-grams should be.

You could guess at the useful range, though, and then have ((max n)-(min n)+1) fields, scaling
the boost for each with the corresponding value of n.  

Just using 2-grams could be good enough, since the longer the sub-phrase match, the more matching
2-grams.

> Also it means one has to get all these bi grams and then construct
> a boolean OR query which is not very efficient either.

In terms of your requirements, though, I think you're stuck with this inefficiency, no matter
what solution you end up with; you need to do some form of term combination in your queries.
 And the ShingleFilter approach doesn't compare badly here, since positions for phrase queries
don't have to be looked up during scoring.  

If index space efficiency is a concern, though, the one-field-per-value-of-n solution I mentioned
above could pose a problem.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message