zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log
Date Mon, 17 Jul 2017 16:35:01 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090045#comment-16090045
] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
-------------------------------------------

Github user karanmehta93 commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/307#discussion_r127757305
  
    --- Diff: src/java/test/org/apache/zookeeper/server/ZooKeeperServerMainTest.java ---
    @@ -138,14 +145,56 @@ void delete(File f) throws IOException {
             ServerCnxnFactory getCnxnFactory() {
                 return main.getCnxnFactory();
             }
    +
         }
     
    -    public static  class TestZKSMain extends ZooKeeperServerMain {
    +    public static class TestZKSMain extends ZooKeeperServerMain {
    +
    +        private ServerStats serverStats;
    +
    +        @Override
    +        public ZooKeeperServer getZooKeeperServer(FileTxnSnapLog txnLog, ServerConfig
config, ZKDatabase zkDb) {
    +            ZooKeeperServer zooKeeperServer = super.getZooKeeperServer(txnLog, config,
zkDb);
    +            serverStats = zooKeeperServer.serverStats();
    +            return zooKeeperServer;
    +        }
    +
    +        @Override
             public void shutdown() {
                 super.shutdown();
             }
         }
     
    +    // Test for ZOOKEEPER-2770 ZooKeeper slow operation log
    +    @Test
    +    public void testRequestWarningThreshold() throws IOException, KeeperException, InterruptedException
{
    +        ClientBase.setupTestEnv();
    +
    +        final int CLIENT_PORT = PortAssignment.unique();
    +
    +        MainThread main = new MainThread(CLIENT_PORT, true, null, 0);
    +        main.start();
    +
    +        Assert.assertTrue("waiting for server being up",
    +                ClientBase.waitForServerUp("127.0.0.1:" + CLIENT_PORT,
    +                        CONNECTION_TIMEOUT));
    +        // Get the stats object from the ZooKeeperServer to keep track of high latency
requests.
    +        ServerStats stats = main.main.serverStats;
    +
    +        ZooKeeper zk = new ZooKeeper("127.0.0.1:" + CLIENT_PORT,
    +                ClientBase.CONNECTION_TIMEOUT, this);
    +
    +        zk.create("/foo1", "foobar".getBytes(), Ids.OPEN_ACL_UNSAFE,
    +                CreateMode.PERSISTENT);
    +
    +        Assert.assertEquals(new String(zk.getData("/foo1", null, null)), "foobar");
    +        // It takes a while for the counter to get updated sometimes, this is added to
reduce flakyness
    +        Thread.sleep(1000);
    --- End diff --
    
    @tdunning Yes a conditional timeout is a better option.
    From what I understand, the failure can be cause in test-only scenario where there is
no physical network between client and server, which sometimes results in client getting back
the acknowledge and the server is just yet to complete incrementing the count. IMO, it is
not possible in actual cluster. 
    > Would this be susceptible to a sync() to force the test to come up to date with the
leader?
    
    I am not sure what you mean by this. Could you please explain?


> ZooKeeper slow operation log
> ----------------------------
>
>                 Key: ZOOKEEPER-2770
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>         Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why any given
read or write operation may become slow: a software bug, a protocol problem, a hardware issue
with the commit log(s), a network issue. If the problem is constant it is trivial to come
to an understanding of the cause. However in order to diagnose intermittent problems we often
don't know where, or when, to begin looking. We need some sort of timestamped indication of
the problem. Although ZooKeeper is not a datastore, it does persist data, and can suffer intermittent
performance degradation, and should consider implementing a 'slow query' log, a feature very
common to services which persist information on behalf of clients which may be sensitive to
latency while waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally processing the
request, that the current time minus arrival time of the request is beyond a configured threshold.

> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message