lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thom Nelson" <thomnel...@gmail.com>
Subject Re: Cache BitSet or doc number?
Date Sun, 28 Oct 2007 23:58:33 GMT
It all depends on whether you want to cache the top n doc ids or to cache
some sort of filter.  Solr uses DocSets (unordered by design) to filter
search results in a HitCollector, and there is also the Lucene Filter
mechanism which relies on BitSet.  Decoupling Filter from BitSet is a great
thing to do, but if all you want to do is cache the search results and not a
Filter, your idea of just storing ordered doc ids is probably fine.  If you
store them as VInts you can keep Integer-level precision and save some
memory.

On 10/27/07, Paul Elschot <paul.elschot@xs4all.nl> wrote:
>
> Have a look at decoupling Filter from BitSet:
>
> http://issues.apache.org/jira/browse/LUCENE-584
>
> There also is a SortedVIntList there that stores document numbers
> more compactly than BitSet,  and an implementation of
> CachingFilterQuery (iirc) that chooses the more compact representation
> of BitSet and SortedVIntList.
>
> Regards,
> Paul Elschot
>
>
> On Saturday 27 October 2007 02:15:48 Yonik Seeley wrote:
> > On 10/26/07, John Patterson <jdp2000@gmail.com> wrote:
> > > Thom Nelson wrote:
> > > > Check out the HashDocSet from Solr, this is the best way to cache
> small
> > > > sets of search results.  In general, the Solr BitSet/DocSet classes
> are
> > > > more efficient than using the standard java.util.BitSet.  You can
> use
> > > > these independent of the rest of Solr (though I recommend checking
> out
> > > > Solr if you want to do complex caching).
> > >
> > > I imagine the fastest way to combine cached results is to store them
> in
> > > an array ordered by doc number so that the ConjunctionQuery can use
> them
> > > directly.  The Javadoc for HashDocSet says that they are stored out of
> > > order which would make this impossible.
> >
> > You're speaking at quite an abstract level... it really depends on
> > what specific issue you are seeing that you're trying to solve.
> >
> > -Yonik
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message