hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-6748) Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback
Date Tue, 15 Jan 2013 01:50:13 GMT

     [ https://issues.apache.org/jira/browse/HBASE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeffrey Zhong updated HBASE-6748:
---------------------------------

    Attachment: hbase-6748.patch


I checked the issue and was able to repro the issue once but not always.

There are two issues:
1) delete using retry count=long.MAX_VALUE 2) new zk client instance created during master
abort may not be seen by other threads due to no volatile declaration
    
Attached patch including:
1) refactoring code to handle ZK session expired consistently in all zk async callback functions
as we currently do in CreateRescan & GetData async callbacks
2) retry deletion in TimeoutMonitor where other maintenance work are done. remove existing
infinite loop like async calls which may jam callback queue
3) make RecoverableZooKeeper.zk volatile

Thanks,
-Jeffrey

                
> Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-6748
>                 URL: https://issues.apache.org/jira/browse/HBASE-6748
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.1, 0.96.0
>            Reporter: Jieshan Bean
>             Fix For: 0.96.0, 0.94.5
>
>         Attachments: hbase-6748.patch
>
>
> You can ealily understand the problem from the below logs:
> {code}
> [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=3
> [2012-09-01 11:41:02,062] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=2
> [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=1
> [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] create rc =SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=0
> [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager 393] failed to create task node/hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
> [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager 353] Error splitting /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
> [2012-09-01 11:41:02,063] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=9223372036854775807
> [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=9223372036854775806
> [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=9223372036854775805
> [2012-09-01 11:41:02,064] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=9223372036854775804
> [2012-09-01 11:41:02,065] [WARN ] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] delete rc=SESSIONEXPIRED
for /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
remaining retries=9223372036854775803
> ...................
> [2012-09-01 11:41:03,307] [ERROR] [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1]
[org.apache.zookeeper.ClientCnxn 623] Caught unexpected throwable
> java.lang.StackOverflowError
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message