lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: DocSet: BitDocSet or HashDocSet ?
Date Wed, 29 Oct 2008 23:01:36 GMT

:  The doc of HashDocSet says "t can be a better choice if there are few
: docs in the set" . What does 'few' means in this context ?

it's relative the total size of your index.  if you have a million docs, 
but you are dealing with DocSets that are only going to contain 10 docs, 
then both the memory requirements and the lookup speed on a HashDocSet is 
probably going to be faster.

exactly where the sweetspot is as far as size and speed is somewhat hard 
to pin down.

if i recall correctly from the way yonik implmented OpenBitSet, the size 
isn't purely a factor of set size either ... a BitDocSet containing a 
thousand docs that are very "near" each other in the id space (ie: from a 
uniqueKey:[x TO y] type filter, or even a date based filter where docs 
are generally indexed cronologically) might be more compact and faster 
then a HashDocSet of the same thousand docs -- but a thousand docs 
scattared arround the id space with lots of big gaps in the middle might 
be much bigger then an equivilent HashDocSet.

it's one of those things you have to experiment with.


-Hoss


Mime
View raw message