On 3/4/2012 3:31 AM, Sphene Software wrote:
> Folks,
>
> I am planning to use DIH for an index of size 10 million records.
>
> I would like to know the following;
> - Can DIH scale for this size of an indexes
> - If DIH is a bottleneck, what is the specific issue and how it can be
> addressed
My entire index is about 67 million documents. There are a total of
seven shards, six of them have over 11 million documents each. I can do
a full dataimport (from MySQL) of those six shards simultaneously in
less than three hours. The seventh shard is less than 500000 documents
and builds after the others during a full rebuild. It is rare that we
have to do a full rebuild, it's mostly at schema change time.
I use SolrJ for updates, my experience with that so far suggests that
doing the full import with my SolrJ code would take significantly longer
than three hours.
Thanks,
Shawn
|