lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: Performance between Filter and HitCollector?
Date Thu, 15 Mar 2007 04:27:36 GMT
Thanks for the detailed reponse Hoss.  That's the sort of in depth golden nugget 
I'd like to see in a copy of LIA 2 when it becomes available...

I've wanted to use Filter to cache certain of my Term Queries, as it looked 
faster for straight Term Query searches, but Solr's DocSet interface abstraction 
is more useful.  HashDocSet will probably satisfy 90% of my cache.

Index DBs will typically be in the 1-3 million  documents range, but for mail 
which is spread over 1-6K user, so caching lots of BitSets for that number of 
users in not practical!

I ended up creating a DocSetFilter and creating DocSets (a la Solr) from BitSet 
which is then cached.  I then convert it back during Filter.bits().  Not the 
best solution, but the typical hit size is small, so the iteration is fast.

Thanks eks dev for the info about Lucene-584 - that looks like an interesting 
set of patches.

Antony

Chris Hostetter wrote:
> it's kind of an Apples/Oranges comparison .. in the examples you gave
> below, one is executing an arbitrary query (which oculd be anything) the
> other is doing a simple TermEnumeration.
> 
> Asuming that Query is a TermQuery, the Filter is theoreticaly going to be
> faster becuase it does't have to compute any Scores ... generally speaking
> a a Filter will alwyas be a little faster then a functionally equivilent
> Query for the purposes of building up a simple BitSet of matching
> documents because teh Query involves the score calcuations ... but the
> Query is generally more usable.
> 
> The Query can also be more efficient in other ways, because the
> HitCollector doesn't *have* to build a BitSet, it can deal with the
> results in whatever way it wants (where as a Filter allways generates a
> BitSet).
> 
> Solr goes the HitCollector route for a few reasons:
>   1) allows us to use hte DocSet abstraction which allows other
>      performance benefits over straight BitSets
>   2) allows us to have simpler code that builds DocSets and DocLists
>      (DocLists know about scores, sorting, and pagination) in a single
>      pass when scores or sorting are requested.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message