zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Han <h...@apache.org>
Subject Re: improving tolerance to network failures
Date Tue, 23 Oct 2018 17:21:37 GMT
>> Will there be a code effect?

There will be - the current rebalancing algorithm will be broken if no code
is done to StaticHostProvider.updateServerList to teach it aware of
multiple server addresses belong to the same server. For example, currently
if we add a new server through reconfig, the rebalance will kick in. In the
new proposal, if we add a new address to the existing server, if no code
change made to updateServerList, the rebalance will also kick in but it
should not, as in this case no new real servers are added.

>> My own experience is that production settings typically involve
Zookeeper servers with very consistent hardware where this would not be an
issue.

I think this is generally true, but we should consider cases where user is
upgrading hardware, which might take a while and during this time it would
be ideal if ZK offer the capability of balanced client connections across
ensemble with heterogeneous hardwares. As a user myself, I'd like to have
this feature, especially consider it seems not hard to implement. What Alex
proposed should work. Another approach might be to assign weights to each
address (a single server has weight one), and this will reduce to a
weighted random selection problem.

Overall, I think this proposal has little impact on server side, most
impact is on client side.


On Tue, Oct 23, 2018 at 9:34 AM Ted Dunning <tdunning@apache.org> wrote:

> There have been several comments on the document. I will be porting
> discussions from the document back to the mailing list each day.
>
> Alex Shraer makes a good point that with the design as stated, there is no
> provision for dealing with the rebalancing of client connections during
> dynamic reconfiguration. I am very curious whether this needs to be
> addressed in the design since it seems that if connections are redirected,
> the same connection logic should apply. I suppose the text needs an update,
> regardless, even if there is no effect. But is there something I missed
> here? Will there be a code effect?
>
> Another comment points out that if you don't have symmetrical hardware for
> the servers (i.e. more network interfaces on some), then client connections
> are likely to be more numerous on servers with more network connections.
> This is undoubtedly true.
>
> I have a question, however, about this. Is this situation actually
> important enough to make the first version of this change? My own
> experience is that production settings typically involve Zookeeper servers
> with very consistent hardware where this would not be an issue.
>
> What experience do others have, particularly in production situations?
>
> On 2018/10/23 02:02:12, Ted Dunning <ted.dunning@gmail.com> wrote:
> > ...
> > I have started a collaborative document to work on the design approach.
> > Once that is judged by the community to be sufficiently mature, I will
> move
> > it to a JIRA.
> >
> > That document is at
> >
> https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> >
> > The design document is currently open to the world for commenting so that
> > anybody can suggest changes or ask questions. I will act as a bit of a
> > moderator so that the document can remain completely open.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message