lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Merging of index in Solr
Date Thu, 23 Nov 2017 04:50:56 GMT
On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote:
> I'm doing the merging on the SSD drive, the speed should be ok?

The speed of virtually all modern disks will have almost no influence on 
the speed of the merge.  The bottleneck isn't disk transfer speed, it's 
the operation of the merge code in Lucene.

As I said earlier in this thread, a merge is **NOT** just a copy. Lucene 
must completely rebuild the data structures of the index to incorporate 
all of the segments of the source indexes into a single segment in the 
target index, while simultaneously *excluding* information from 
documents that have been deleted.

The best speed I have ever personally seen for a merge is 30 megabytes 
per second.  This is far below the sustained transfer rate of a typical 
modern SATA disk.  SSD is capable of far faster data transfer ...but it 
will NOT make merges go any faster.

> We need to merge because the data are indexed in two different collections,
> and we need them to be under the same collection, so that we can do things
> like faceting more accurately.
> Will sharding alone achieve this? Or do we have to merge first before we do
> the sharding?

If you want the final index to be sharded, it's typically best to index 
from scratch into a new empty collection that has the number of shards 
you want.  The merging tool you're using isn't aware of concepts like 
shards.  It combines everything into a single index.

It's not entirely clear what you're asking with the question about 
sharding alone.  Making a guess:  I have never heard of facet accuracy 
being affected by whether or not the index is sharded.  If that *is* 
possible, then I would expect an index that is NOT sharded to have 
better accuracy.

Thanks,
Shawn


Mime
View raw message