lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Using filters to speed up queries
Date Sun, 24 Oct 2010 11:31:41 GMT
Some more speed up may be possible when the same combination of
filters (user account and date range here) is reused for another query.
The combined filter can then be made as an OpenBitSetDISI
(in the util package) and kept around for reuse.

Regards,
Paul Elschot

Op zondag 24 oktober 2010 12:34:07 schreef Khash Sajadi:
> Here is what I've found so far:
> 
> I have three main sets to use in a query:
> Account MUST be xxx
> User query
> DateRange on the query MUST be in (a,b) it is a NumericField
> 
> I tried the following combinations (all using a BooleanQuery with the user
> query added to it)
> 
> 1. One:
> - Add ACCOUNT as a TermQuery
> - Add DATE RANGE as Filter
> 
> 2. Two
> - Add ACCOUNT as Filer
> - Add DATE RANGE as NumericRangeQuery
> 
> I tried caching the filters on both scenarios.
> I also tried both scenarios by passing the query as a ConstantScoreQuery as
> well.
> 
> I got the best result (about 4x faster) by using a cached filter for the
> DATE RANGE and leaving the ACCOUNT as a TermQuery.
> 
> I think I'm happy with this approach. However, the security risk Uwe
> mentioned when using ACCOUNT as a Query makes me nervous. Any suggestions?
> 
> As for document distribution, the ACCOUNTS have a similar distribution of
> documents.
> 
> Also, I still would like to try the multi index approach, but not sure about
> the memory, file handle burden of it (having potentially thousands of
> reades/writers/searchers) open at the same time. I use two processes one as
> indexer and one for search with the same underlying FSDirectory. As for
> search, I use writer.getReader().reopen within a SearchManager as suggested
> by Lucene in Action.
> 
> 
> 
> 
> On 24 October 2010 10:27, Paul Elschot <paul.elschot@xs4all.nl> wrote:
> 
> > Op zondag 24 oktober 2010 00:18:48 schreef Khash Sajadi:
> > > My index contains documents for different users. Each document has the
> > user
> > > id as a field on it.
> > >
> > > There are about 500 different users with 3 million documents.
> > >
> > > Currently I'm calling Search with the query (parsed from user)
> > > and FieldCacheTermsFilter for the user id.
> > >
> > > It works but the performance is not great.
> > >
> > > Ideally, I would like to perform the search only on the documents that
> > are
> > > relevant, this should make it much faster. However, it seems
> > Search(Query,
> > > Filter) runs the query first and then applies the filter.
> > >
> > > Is there a way to improve this? (i.e. run the query only on a subset of
> > > documents)
> > >
> > > Thanks
> > >
> >
> > When running the query with the filter, the query is run at the same time
> > as the filter. Initially and after each matching document, the filter is
> > assumed to
> > be cheaper to execute and its first or next matching document is
> > determined.
> > Then the query and the filter are repeatedly advanced to each other's next
> > matching
> > document until they are at the same document (ie. there is a match),
> > similar to
> > a boolean query with two required clauses.
> > The java code doing this is in the private method
> > IndexSearcher.searchWithFilter().
> >
> > It could be that filling the field cache is the performance problem.
> > How is the performance when this search call with the FieldCacheTermsFilter
> > is repeated?
> >
> > Also, for a single indexed term to be used as a filter (the user id in this
> > case)
> > there may be no need for a cache, a QueryWrapperFilter around the TermQuery
> > might suffice.
> >
> > Regards,
> > Paul Elschot
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message