hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "binlijin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
Date Thu, 23 Nov 2017 02:58:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263720#comment-16263720
] 

binlijin commented on HBASE-19290:
----------------------------------

[~tedyu]
bq. Assuming patch v3 is very close to the version you run in the 2000+ node production cluster,
can you post some performance numbers (in terms of reduction in zookeeper requests) so that
we can know its effectiveness ?

We do not record the performance numbers.
But without the patch we can see the HMaster get the zookeeper event very very slowly...

HMaster put up split task:

*2017-07-11 20:22:57,608* DEBUG [main-EventThread] coordination.SplitLogManagerCoordination:
put up splitlog task at znode /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

RegionServer grab the task and done it.

*2017-07-11 20:23:33,689* INFO  [SplitLogWorker-hadoop1435:16020] coordination.ZkSplitLogWorkerCoordination:
worker hadoop1435.et2.tbsite.net,16020,1495647366458 acquired task /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

*2017-07-11 20:25:47,131* INFO  [RS_LOG_REPLAY_OPS-hadoop1435:16020-1] coordination.ZkSplitLogWorkerCoordination:
successfully transitioned task /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548
to final state DONE hadoop1435.et2.tbsite.net,16020,1495647366458

HMaster get the task done event and delete it:

*2017-07-11 20:49:52,879* INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
task /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548
entered state: DONE hadoop1435.et2.tbsite.net,16020,1495647366458

*2017-07-11 20:49:52,881* INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
Done splitting /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

*2017-07-11 21:19:52,280* DEBUG [main-EventThread] coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback:
deleted /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548


> Reduce zk request when doing split log
> --------------------------------------
>
>                 Key: HBASE-19290
>                 URL: https://issues.apache.org/jira/browse/HBASE-19290
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-19290.master.001.patch, HBASE-19290.master.002.patch, HBASE-19290.master.003.patch,
HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort and doing
split log, the split is very very slow, and we find the regionserver and master wait on the
zookeeper response, so we need to reduce zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will get rsZNode's
children from zookeeper, when cluster is huge, this is heavy. This patch reduce the request.

> (2) When the regionserver has max split tasks running, it may still trying to grab task
and issue zookeeper request, we should sleep and wait until we can grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message