lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Ronge <mro...@theronge.com>
Subject Re: Pre-filtering for expensive query
Date Wed, 03 Sep 2008 16:06:57 GMT

On Aug 30, 2008, at 3:01 PM, Paul Elschot wrote:

> Op Saturday 30 August 2008 18:19:09 schreef Matt Ronge:
>> On Aug 30, 2008, at 4:43 AM, Karl Wettin wrote:
>>> Can you tell us a bit more about what you custom query does?
>>> Perhaps you can build the "candidate filter" and reuse it over and
>>> over again?
>>
>> I cannot reuse it. The candidate filter would be constructed by first
>> running a boolean query with a number of SHOULD clauses. So then I
>> know what docs atleast contain the terms I'm looking for. Once I have
>> this set, I will look at the ordering of the matches (it's a bit more
>> sophisticated than just a phrase query) and find the final matches.
>
> Sounds like you may want to take a look at SpanNearQuery.

I'm going to take a second look at SpanNearQuery. I need it to support  
optional tokens, so I'm guessing I'll need to create a subclass to do  
that.

>
>> Since my boolean clauses are different for each query I can't reuse
>> the filter.
>
> With (a variation of) SpanNearQuery you may end up not needing
> any filtering at all, because it already uses skipTo() where possible.
>
> In case you are looking for documents that contain partial phrases
> from an input query that has more than 2 words, have a look at Nutch.

I poked around in the Nutch docs and Javadocs, what should I look at  
in Nutch? What does it do exactly, is it the trick that Doug Cutting  
mentioned where you concat neighboring terms together like "Hello  
world" becomes the token hello.world?

Thanks for everyones help so far, I really appreciate it,
--
Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message