lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <>
Subject Re: Merging of index in Solr
Date Thu, 23 Nov 2017 12:35:24 GMT
Hi Shawn,

Thanks for the info. We will most likely be doing sharding when we migrate
to Solr 7.1.0, and re-index the data.

But as Solr 7.1.0 is still not ready to index EML files yet due to this
JIRA,, we have to make use
with our current Solr 6.5.1 first, which was already created without
sharding from the start.


On 23 November 2017 at 12:50, Shawn Heisey <> wrote:

> On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote:
>> I'm doing the merging on the SSD drive, the speed should be ok?
> The speed of virtually all modern disks will have almost no influence on
> the speed of the merge.  The bottleneck isn't disk transfer speed, it's the
> operation of the merge code in Lucene.
> As I said earlier in this thread, a merge is **NOT** just a copy. Lucene
> must completely rebuild the data structures of the index to incorporate all
> of the segments of the source indexes into a single segment in the target
> index, while simultaneously *excluding* information from documents that
> have been deleted.
> The best speed I have ever personally seen for a merge is 30 megabytes per
> second.  This is far below the sustained transfer rate of a typical modern
> SATA disk.  SSD is capable of far faster data transfer ...but it will NOT
> make merges go any faster.
> We need to merge because the data are indexed in two different collections,
>> and we need them to be under the same collection, so that we can do things
>> like faceting more accurately.
>> Will sharding alone achieve this? Or do we have to merge first before we
>> do
>> the sharding?
> If you want the final index to be sharded, it's typically best to index
> from scratch into a new empty collection that has the number of shards you
> want.  The merging tool you're using isn't aware of concepts like shards.
> It combines everything into a single index.
> It's not entirely clear what you're asking with the question about
> sharding alone.  Making a guess:  I have never heard of facet accuracy
> being affected by whether or not the index is sharded.  If that *is*
> possible, then I would expect an index that is NOT sharded to have better
> accuracy.
> Thanks,
> Shawn

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message