hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prakash Khemani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
Date Mon, 30 Apr 2012 22:31:48 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265461#comment-13265461
] 

Prakash Khemani commented on HBASE-5860:
----------------------------------------

I had missed the fact that isAnyCreateZKNodePending() misses the create of RESCAN nodes. Will
provide a fix.

I was aware of the race condition where isAnyCreateZKNodePending() will return false even
when create-zknode is soon going to be retried. Not worth fixing for the reason you outlined
- creating an extra RESCAN node doesn't hurt. (The code change you have outlined will need
some more changes to make it work)
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize
that all the tasks that were submitted are still unassigned. It will resubmit those tasks
(i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not
been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog
worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling
batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started
splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection
to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks
= 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting
unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting
unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional
data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket
connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional
data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket
connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback:
create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677
retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback:
create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332
retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message