hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14988) WASB: Expose WASB status metrics as counters in Hadoop
Date Fri, 27 Oct 2017 10:59:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222168#comment-16222168
] 

Steve Loughran commented on HADOOP-14988:
-----------------------------------------

As discussed in HADOOP-14973

* I concur with the need to collect client-side statistics from the object store clients,
especially related to failures and throttling, as that answers the question "why are things
so slow"
* I also see that classic metric publishing isn't always the right way to do it. Sometimes
it is: if a specific node is failing the most, that's a node problem for cluster management
tools to detect and react to. But if its a specific job being throttled, that's not an admin
problem, that's a job config and store-layout problem, which needs to be returned at the job
level.

w.r.t Using Hadoop counters for this, it's cute. But these are not "Hadoop counters", they
are mapreduce counters; you can't have a filesystem in hadoop common using or publishing them.
Which means an alternative means of publishing them is needed.

# Hadoop MR could collect the stats from the output filesystem & uprate them to MR counters..
Issue: do you want this per fs, or would it be aggregated across all instances of an fs class?
# the stuff could be collected by the committer and propagated back anyway. This is what I'm
doing in the S3A committers, where I write the stats to _SUCCESS. But that's across the entire
set of filesystems of a specific schema (s3a:// here), not per query, (moot in MR, different
in spark)
# Mingliang's per-thread work here is more foundational, as you want all the stats for a task.

Overall then, yes: I want the counters, not things lost in logs. But we need to have something
which is (a) cross-engine and (b) works on multitenant execution engines and so tie stats
back to specific jobs.



> WASB: Expose WASB status metrics as counters in Hadoop
> ------------------------------------------------------
>
>                 Key: HADOOP-14988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14988
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>
> It would be good to expose WASB status metrics (e.g 503) as Hadoop counters. 
> Here is an example from a spark job, where it ends up spending large amount of time in
retries. Adding hadoop counters would help in analyzing and tuning long running tasks.
> {noformat}
> 2017-10-23 23:07:20,876 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
 SelfThrottlingIntercept:: SendingRequest:   threadId=99, requestType=read , isFirstRequest=false,
sleepDuration=0
> 2017-10-23 23:07:20,877 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=1, ETAG=null,
contentLength=198, requestMethod=GET
> 2017-10-23 23:07:21,877 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
 SelfThrottlingIntercept:: SendingRequest:   threadId=99, requestType=read , isFirstRequest=false,
sleepDuration=0
> 2017-10-23 23:07:21,879 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=2, ETAG=null,
contentLength=198, requestMethod=GET
> 2017-10-23 23:07:24,070 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
 SelfThrottlingIntercept:: SendingRequest:   threadId=99, requestType=read , isFirstRequest=false,
sleepDuration=0
> 2017-10-23 23:07:24,073 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
q:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=3, ETAG=null, contentLength=198,
requestMethod=GET
> 2017-10-23 23:07:27,917 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
 SelfThrottlingIntercept:: SendingRequest:   threadId=99, requestType=read , isFirstRequest=false,
sleepDuration=0
> 2017-10-23 23:07:27,920 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=2, ETAG=null,
contentLength=198, requestMethod=GET
> 2017-10-23 23:07:36,879 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
 SelfThrottlingIntercept:: SendingRequest:   threadId=99, requestType=read , isFirstRequest=false,
sleepDuration=0
> 2017-10-23 23:07:36,881 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=1, ETAG=null,
contentLength=198, requestMethod=GET
> 2017-10-23 23:07:54,786 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
 SelfThrottlingIntercept:: SendingRequest:   threadId=99, requestType=read , isFirstRequest=false,
sleepDuration=0
> 2017-10-23 23:07:54,789 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=3, ETAG=null,
contentLength=198, requestMethod=GET
> 2017-10-23 23:08:24,790 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
 SelfThrottlingIntercept:: SendingRequest:   threadId=99, requestType=read , isFirstRequest=false,
sleepDuration=0
> 2017-10-23 23:08:24,794 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=4, ETAG=null,
contentLength=198, requestMethod=GET
> 2017-10-23 23:08:54,794 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept:
 SelfThrottlingIntercept:: SendingRequest:   threadId=99, requestType=read , isFirstRequest=false,
sleepDuration=0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message