lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Merging of index in Solr
Date Tue, 21 Nov 2017 16:10:48 GMT
I am using the IndexMergeTool from Solr, from the command below:

java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
org.apache.lucene.misc.IndexMergeTool

The heap size is 32GB. There are more than 20 million documents in the two
cores.

Regards,
Edwin



On 21 November 2017 at 21:54, Shawn Heisey <apache@elyograg.org> wrote:

> On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
>
>> Does anyone knows how long usually the merging in Solr will take?
>>
>> I am currently merging about 3.5TB of data, and it has been running for
>> more than 28 hours and it is not completed yet. The merging is running on
>> SSD disk.
>>
>
> The following will apply if you mean Solr's "optimize" feature when you
> say "merging".
>
> In my experience, merging proceeds at about 20 to 30 megabytes per second
> -- even if the disks are capable of far faster data transfer.  Merging is
> not just copying the data. Lucene is completely rebuilding very large data
> structures, and *not* including data from deleted documents as it does so.
> It takes a lot of CPU power and time.
>
> If we average the data rates I've seen to 25, then that would indicate
> that an optimize on a 3.5TB is going to take about 39 hours, and might take
> as long as 48 hours.  And if you're running SolrCloud with multiple
> replicas, multiply that by the number of copies of the 3.5TB index.  An
> optimize on a SolrCloud collection handles one shard replica at a time and
> works its way through the entire collection.
>
> If you are merging different indexes *together*, which a later message
> seems to state, then the actual Lucene operation is probably nearly
> identical, but I'm not really familiar with it, so I cannot say for sure.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message