lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jian chen <>
Subject Re: BitSet implementation and large index
Date Mon, 14 Feb 2005 17:31:06 GMT

In database systems implementation, there is a type of index called
bit map indexing. The bitset implementation could borrow idea from the
database engine implementation.

You could squeeze all the 0's together and write how many of those
0's, that might be very memory saving.

There are various kinds of algorithms for doing this bitset
compression. A good book for reference is the "Database
impelmentations" from Ullman, and other two professors in Standford



On Mon, 14 Feb 2005 09:29:26 -0600 (CST),
<> wrote:
> It seems that for a huge index, it might be a good idea to use a different
> implementation of the BitSet when doing filtering (assuming the
> non-filtered set is relatively small).  This would really help minimize
> the memory required for each filter operation.
> Since the default implementation of BitSet allocates enough memory for
> each position in the set, it seems overkill for a set that has a small
> number of "on" values.
> Any thoughts?
> Tony Schwartz
> by the way, I just started using Lucene about 2 weeks ago, and I am really
> loving it.  The sky's the limit for this framework.  Thanks very much to
> those of you involved in it's development.  Extremely powerful!
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message