lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Using DIH to import 10 million records
Date Mon, 05 Mar 2012 01:19:48 GMT
On 3/4/2012 3:31 AM, Sphene Software wrote:
> Folks,
> I am planning to use DIH for an index of size 10 million records.
> I would like to know the following;
> - Can DIH scale for this size of an indexes
> - If DIH is a bottleneck, what is the specific issue and how it can be
> addressed

My entire index is about 67 million documents.  There are a total of 
seven shards, six of them have over 11 million documents each.  I can do 
a full dataimport (from MySQL) of those six shards simultaneously in 
less than three hours.  The seventh shard is less than 500000 documents 
and builds after the others during a full rebuild.  It is rare that we 
have to do a full rebuild, it's mostly at schema change time.

I use SolrJ for updates, my experience with that so far suggests that 
doing the full import with my SolrJ code would take significantly longer 
than three hours.


View raw message