lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khash Sajadi <kh...@sajadi.co.uk>
Subject Re: Using filters to speed up queries
Date Sun, 24 Oct 2010 10:34:07 GMT
Here is what I've found so far:

I have three main sets to use in a query:
Account MUST be xxx
User query
DateRange on the query MUST be in (a,b) it is a NumericField

I tried the following combinations (all using a BooleanQuery with the user
query added to it)

1. One:
- Add ACCOUNT as a TermQuery
- Add DATE RANGE as Filter

2. Two
- Add ACCOUNT as Filer
- Add DATE RANGE as NumericRangeQuery

I tried caching the filters on both scenarios.
I also tried both scenarios by passing the query as a ConstantScoreQuery as
well.

I got the best result (about 4x faster) by using a cached filter for the
DATE RANGE and leaving the ACCOUNT as a TermQuery.

I think I'm happy with this approach. However, the security risk Uwe
mentioned when using ACCOUNT as a Query makes me nervous. Any suggestions?

As for document distribution, the ACCOUNTS have a similar distribution of
documents.

Also, I still would like to try the multi index approach, but not sure about
the memory, file handle burden of it (having potentially thousands of
reades/writers/searchers) open at the same time. I use two processes one as
indexer and one for search with the same underlying FSDirectory. As for
search, I use writer.getReader().reopen within a SearchManager as suggested
by Lucene in Action.




On 24 October 2010 10:27, Paul Elschot <paul.elschot@xs4all.nl> wrote:

> Op zondag 24 oktober 2010 00:18:48 schreef Khash Sajadi:
> > My index contains documents for different users. Each document has the
> user
> > id as a field on it.
> >
> > There are about 500 different users with 3 million documents.
> >
> > Currently I'm calling Search with the query (parsed from user)
> > and FieldCacheTermsFilter for the user id.
> >
> > It works but the performance is not great.
> >
> > Ideally, I would like to perform the search only on the documents that
> are
> > relevant, this should make it much faster. However, it seems
> Search(Query,
> > Filter) runs the query first and then applies the filter.
> >
> > Is there a way to improve this? (i.e. run the query only on a subset of
> > documents)
> >
> > Thanks
> >
>
> When running the query with the filter, the query is run at the same time
> as the filter. Initially and after each matching document, the filter is
> assumed to
> be cheaper to execute and its first or next matching document is
> determined.
> Then the query and the filter are repeatedly advanced to each other's next
> matching
> document until they are at the same document (ie. there is a match),
> similar to
> a boolean query with two required clauses.
> The java code doing this is in the private method
> IndexSearcher.searchWithFilter().
>
> It could be that filling the field cache is the performance problem.
> How is the performance when this search call with the FieldCacheTermsFilter
> is repeated?
>
> Also, for a single indexed term to be used as a filter (the user id in this
> case)
> there may be no need for a cache, a QueryWrapperFilter around the TermQuery
> might suffice.
>
> Regards,
> Paul Elschot
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message