hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Himanshu Vashishtha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8911) Inject MTTR specific traces to get a break up of various steps
Date Wed, 10 Jul 2013 01:18:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704080#comment-13704080
] 

Himanshu Vashishtha commented on HBASE-8911:
--------------------------------------------

Using a patched version, I kill a meta regionserver (that also had one non-meta region):
The master provides dump:
{code}
1) {"Description":"SplitLogManager","Start":1373415141061,"Annotations":{},"TraceID":7794030922126752097,"ParentID":-4469867469428889343,"Stop":1373415145255,"SpanID":-948196168508937537}

2) {"Description":"MetaServerShutdownHandler","Start":1373415141027,"Annotations":{},"TraceID":7794030922126752097,"ParentID":477902,"Stop":1373415145379,"SpanID":-4469867469428889343}

3) {"Description":"SplitLogManager","Start":1373415146016,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415150138,"SpanID":-8986584144293319916}

4) {"Description":"ServerShutdownHandler: AssignmentManager","Start":1373415150138,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415150181,"SpanID":5233034729044488427}

5) {"Description":"ServerShutdownHandler","Start":1373415145380,"Annotations":{},"TraceID":3471834208649164937,"ParentID":477902,"Stop":1373415150181,"SpanID":7097670840195911759}

{code}

h3. Explanation:

At first, meta region is handled. Line 1) is about Log splitting of the meta logs. Line 2)
is the processing time of MetaSSH (see its start/stop time covers the log splitting span at
line 1).

Lines 3, 4 and 5 are about processing non-meta logs and assigning regions on the dead regionserver.
Line 3 is about splitting, line 4 is about region assignment, and line 5 is the parent of
3 and 4.


On the regionserver where the new meta lands, I get the following trace:
{code}
1) {"Description":"handling callId: 28 service: AdminService methodName: openRegion size:
67.0 connection: 10.20.188.114:49125","Start":1373415145297,"Annotations":{},"TraceID":7794030922126752097,"ParentID":-4469867469428889343,"Stop":1373415145380,"SpanID":-2912335450699571517}

2) {"Description":"handling callId: 29 service: ClientService methodName: scan size: 71.0
connection: 10.20.188.114:49126","Start":1373415145865,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415145866,"SpanID":-1834828785626104167}

3) {"Description":"RS_OPEN_META-a1215:40020-0","Start":1373415145376,"Annotations":{},"TraceID":7794030922126752097,"ParentID":-2912335450699571517,"Stop":1373415145872,"SpanID":522405091715978575}

4) {"Description":"handling callId: 30 service: ClientService methodName: scan size: 71.0
connection: 10.20.188.114:49126","Start":1373415145977,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415145999,"SpanID":-3636631271530731407}

5) {"Description":"handling callId: 31 service: ClientService methodName: scan size: 50.0
connection: 10.20.188.114:49126","Start":1373415146000,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415146005,"SpanID":2496204017007798885}

6) {"Description":"handling callId: 32 service: ClientService methodName: scan size: 48.0
connection: 10.20.188.114:49126","Start":1373415146009,"Annotations":{},"TraceID":3471834208649164937,"ParentID":7097670840195911759,"Stop":1373415146010,"SpanID":5149809990225735159}

7) {"Description":"handling callId: 33 service: AdminService methodName: openRegion size:
66.0 connection: 10.20.188.114:49125","Start":1373415150162,"Annotations":{},"TraceID":3471834208649164937,"ParentID":5233034729044488427,"Stop":1373415150183,"SpanID":-642245433468525490}

{code}

h3. Explanation:

Lines 1 and 3 are about opening the meta region, while other lines are about handling other
regions (scaning the meta and assigning the non-meta region).

Most importantly, we could figure out the total time taken by the regionserver failover by
looking at the HMaster trace file.
                
> Inject MTTR specific traces to get a break up of various steps
> --------------------------------------------------------------
>
>                 Key: HBASE-8911
>                 URL: https://issues.apache.org/jira/browse/HBASE-8911
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>    Affects Versions: 0.95.1
>            Reporter: Himanshu Vashishtha
>         Attachments: 8911-v0.patch
>
>
> There are various steps involved in a regionserver recovery process. This jira adds instrumentation
at various places in order to get an idea what are the steps involved in a regionserver recovery
and how much time is spent in each of these parts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message