impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Knupp (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
Date Wed, 21 Dec 2016 19:43:17 GMT
David Knupp has posted comments on this change.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
......................................................................


Patch Set 2:

(4 comments)

Success output:
  Connecting to Zookeeper host(s). 
  Success: <kazoo.client.KazooClient object at 0x7fc349f3c210>
  Waiting for HBase node: /hbase/master
  Success: /hbase/master
  Waiting for HBase node: /hbase/rs
  Success: /hbase/rs
  Stopping Zookeeper client

Success with connection retry output:
  Connecting to Zookeeper host(s).
  Success: <kazoo.client.KazooClient object at 0x7f0840813210>
  Waiting for HBase node: /hbase/master
  Zookeeper connection loss: retrying connection (1 of 3 attempts)
  Stopping Zookeeper client
  Connecting to Zookeeper host(s).
  Success: <kazoo.client.KazooClient object at 0x7f084371ad50>
  Waiting for HBase node: /hbase/master
  Success: /hbase/master
  Waiting for HBase node: /hbase/rs
  Success: /hbase/rs
  Stopping Zookeeper client
  HBase startup scripts succeeded

Error output with ConnectionLoss:
  Connecting to Zookeeper host(s).
  Success: <kazoo.client.KazooClient object at 0x7f66198f7210>
  Waiting for HBase node: /hbase/master
  Zookeeper connection loss: retrying connection (1 of 3 attempts)
  Stopping Zookeeper client
  Connecting to Zookeeper host(s).
  Success: <kazoo.client.KazooClient object at 0x7f661c7fdd50>
  Waiting for HBase node: /hbase/master
  Zookeeper connection loss: retrying connection (2 of 3 attempts)
  Stopping Zookeeper client
  Connecting to Zookeeper host(s).
  Success: <kazoo.client.KazooClient object at 0x7f661929fcd0>
  Waiting for HBase node: /hbase/master
  Zookeeper connection loss: retrying connection (3 of 3 attempts)
  Stopping Zookeeper client
  Connecting to Zookeeper host(s).
  Success: <kazoo.client.KazooClient object at 0x7f66192ac490>
  Waiting for HBase node: /hbase/master
  Stopping Zookeeper client
  Traceback (most recent call last):
    File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 193, in <module>
      errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, args.timeout)
    File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 141, in check_znodes_list_for_errors
      errors = sum([check_znode(node, zk_client, timeout) for node in nodes])
    File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 112, in check_znode
      raise ConnectionLoss
  kazoo.exceptions.ConnectionLoss

Random error:
  Connecting to Zookeeper host(s).
  Success: <kazoo.client.KazooClient object at 0x7f703e3bc210>
  Waiting for HBase node: /hbase/master
  Unexpected error checking HBase node:
  Stopping Zookeeper client
  Traceback (most recent call last):
    File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 188, in <module>
      errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, args.timeout)
    File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 140, in check_znodes_list_for_errors
      return sum([check_znode(node, zk_client, timeout) for node in nodes])
    File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 112, in check_znode
      raise RuntimeError
  RuntimeError

http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

PS2, Line 134: errors
> error counting seems to be off? You are overwriting it in L140. Perhaps use
Actually, I don't think we need to increment errors in the case of exceptions. Thanks for
point this out.


Line 143:                 LOGGER.warn("Zookeeper connection loss: retrying connection")
> Might be worth logging the current connection attempt and the maximum attem
Done


Line 143:                 LOGGER.warn("Zookeeper connection loss: retrying connection")
> Log the exception trace here as well, just incase we need it for debugging?
Done


Line 145:                 errors += 1
> I suggest waiting a little before retrying to connect. How about 1s?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message