lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: Using DIH to import 10 million records
Date Mon, 05 Mar 2012 01:19:48 GMT
On 3/4/2012 3:31 AM, Sphene Software wrote:
> Folks,
>
> I am planning to use DIH for an index of size 10 million records.
>
> I would like to know the following;
> - Can DIH scale for this size of an indexes
> - If DIH is a bottleneck, what is the specific issue and how it can be
> addressed

My entire index is about 67 million documents.  There are a total of 
seven shards, six of them have over 11 million documents each.  I can do 
a full dataimport (from MySQL) of those six shards simultaneously in 
less than three hours.  The seventh shard is less than 500000 documents 
and builds after the others during a full rebuild.  It is rare that we 
have to do a full rebuild, it's mostly at schema change time.

I use SolrJ for updates, my experience with that so far suggests that 
doing the full import with my SolrJ code would take significantly longer 
than three hours.

Thanks,
Shawn


Mime
View raw message