lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Solr segment merging in different replica
Date Fri, 05 Feb 2016 03:58:14 GMT
Hi Shawn,

Thanks for your reply.

Yes, we were planning for such instance where the replica went down during
indexing, and when it re-started, it will start to copy the index over to
the main node.


Regards,
Edwin


On 5 February 2016 at 03:35, Shawn Heisey <apache@elyograg.org> wrote:

> On 2/4/2016 9:27 AM, Zheng Lin Edwin Yeo wrote:
> > Yes, I'm already on SolrCloud, so I'll probably stick to that.
> >
> > Regarding the network, I am just afraid that when the replica code copies
> > the index over from the main node, it will use up all the available
> > bandwidth, and causes the search query to have little bandwidth left,
> which
> > will affect the performance of the search from the front-end.
>
> Replicating the index in SolrCloud should be a VERY rare event, only
> happening when there's a serious problem such as a server going down and
> coming back up later, or after certain maintenance events.
>
> Merges do not involve network traffic.  In SolrCloud, each replica will
> handle merging locally.  It does not happen over the network.
>
> Even if a replication DOES happen, TCP makes room on the network for new
> connections like queries.  It's inherent in the design of the protocol.
> This is particularly effective on LAN connectivity.  If there's a WAN
> involved, then you might be right to worry about bandwidth.
>
> Regarding something you asked earlier in the thread: Assuming LAN
> connectivity, I think the only thing you will achieve by using separate
> network interfaces is configuration complexity.
>
> It might be possible to separate the interfaces, even though I think
> it's not required.  If you populate the hosts file on each server, or
> use split DNS, you could have clients use a different address than the
> Solr servers themselves use for inter-node communication, but in general
> there is no need for this, because high network bandwidth utilization is
> only likely during a replication event, or during bulk indexing to
> rebuild collections.  For bulk indexing, the CPU and disk I/O impact
> will almost certainly cause more of a slowdown than the network, unless
> you're using a low-speed WAN, which is not recommended.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message