lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@thorntothehorn.org>
Subject DocIdSet to represent small numberr of hits in large Document set
Date Tue, 05 Apr 2011 06:24:29 GMT
I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4).

Many of our indexes are 5M+ Documents, however, only a small subset of these are 
relevant to any user.  As a DocIdSet, backed by a BitSet or OpenBitSet, is 
rather inefficient in terms of memory use, what is the recommended way to 
DocIdSet implementation to use in this scenario?

Seems like SortedVIntList can be used to store the info, but it has no methods 
to build the list in the first place, requiring an array or bitset in the 
constructor.

I had used Nutch's DocSet and HashDocSet implementations in my 2.3.2 deployment, 
but want to move away from that Nutch dependency, so wondered if Lucene had a 
way to do this?

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message