lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: DocSet: BitDocSet or HashDocSet ?
Date Mon, 03 Nov 2008 20:13:56 GMT

On 28-Oct-08, at 5:36 AM, Jérôme Etévé wrote:

> Hi all,
>
>  In my code, I'd like to keep a subset of my 14M docs which is around
> 100k large.
>
> What is according to you the best option in terms of speed and  
> memory usage ?
>
> Some basic thoughts tells me the BitDocSet should be the fastest for
> lookup, but takes ~ 14M * sizeof(int) in memory, whereas
> the HashDocSet takes just ~ 100k * sizeof(int)  , but is a bit  
> slower lookup.
>
> The doc of HashDocSet says "t can be a better choice if there are few
> docs in the set" . What does 'few' means in this context ?

Solr, by default, ships in a configuration that creates filters with  
HashDocSet if the size of the set is < 3000, and BitDocSet otherwise.   
This parameter is tunable in solrconfig.xml.  You might find it helps  
to increase this slightly with 14m docs--say to 5000-6000.  In my  
testing, any higher than this is a net loss.

-Mike
Mime
View raw message