lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: Using DIH to import 10 million records
Date Mon, 05 Mar 2012 01:56:09 GMT
You can run the DIH with multiple threads feeding from the same query.
Depends also on the size of the document: large documents may index
faster if they have their own threads. This may then interact with the
new NRT multi-commit code.

On Sun, Mar 4, 2012 at 5:19 PM, Shawn Heisey <> wrote:
> On 3/4/2012 3:31 AM, Sphene Software wrote:
>> Folks,
>> I am planning to use DIH for an index of size 10 million records.
>> I would like to know the following;
>> - Can DIH scale for this size of an indexes
>> - If DIH is a bottleneck, what is the specific issue and how it can be
>> addressed
> My entire index is about 67 million documents.  There are a total of seven
> shards, six of them have over 11 million documents each.  I can do a full
> dataimport (from MySQL) of those six shards simultaneously in less than
> three hours.  The seventh shard is less than 500000 documents and builds
> after the others during a full rebuild.  It is rare that we have to do a
> full rebuild, it's mostly at schema change time.
> I use SolrJ for updates, my experience with that so far suggests that doing
> the full import with my SolrJ code would take significantly longer than
> three hours.
> Thanks,
> Shawn

Lance Norskog

View raw message