hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13453) S3Guard: Instrument new functionality with Hadoop metrics.
Date Fri, 10 Mar 2017 15:14:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905231#comment-15905231
] 

Steve Loughran commented on HADOOP-13453:
-----------------------------------------

I'm afraid HADOOP-13914 has just broken the patch, which means, sadly, you get to do the merge.
Let's get this in *before* anything else traumatic comes in, so other patches get to suffer
next time.

I like what you've done measuring latency as well as counts. I think we could actually do
this more broadly. I think the timing counting should be in a finally() clause though, so
timings for failures get included too. (side issue: count success and failures separately?
with different timings?)

I would like to think about how we could avoiding having to pass the instrumentation around
all the time. Ideally, we could just pass it in as a constructor to the metadata store. Alternatively,
that store could collect metrics and we could wire it up, but I don't see an easy way to do
that in Hadoop metrics (compared to Coda Hale's). The easiest would be just to pass in the
S3AInstrumentation (or an inner class) down, but currently the metastore interface is not
specific to S3A only.

If we add an interface for metadata store instrumentation, then S3AInstrumentation can implement
it in an inner class, and S3AFS can pass it down during initialization. Th's would let the
metastore do all it wants, with well defined strings, of course.

What do people think?


> S3Guard: Instrument new functionality with Hadoop metrics.
> ----------------------------------------------------------
>
>                 Key: HADOOP-13453
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13453
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Ai Deng
>         Attachments: HADOOP-13453-HADOOP-13345-001.patch, HADOOP-13453-HADOOP-13345-002.patch
>
>
> Provide Hadoop metrics showing operational details of the S3Guard implementation.
> The metrics will be implemented in this ticket:
> ● S3GuardRechecksNthPercentileLatency (MutableQuantiles) ­​ Percentile time spent
> in rechecks attempting to achieve consistency. Repeated for multiple percentile values
> of N.  This metric is an indicator of the additional latency cost of running
S3A with
> S3Guard.
> ● S3GuardRechecksNumOps (MutableQuantiles) ­​ Number of times a consistency
> recheck was required while attempting to achieve consistency.
> ● S3GuardStoreNthPercentileLatency (MutableQuantiles) ­​ Percentile time spent in
> operations against the consistent store, including both write operations during file
system
> mutations and read operations during file system consistency checks. Repeated for
> multiple percentile values of N. This metric is an indicator of latency to the
consistent
> store implementation.
> ● S3GuardConsistencyStoreNumOps (MutableQuantiles) ­​ Number of operations
> against the consistent store, including both write operations during file system mutations
> and read operations during file system consistency checks.
> ● S3GuardConsistencyStoreFailures (MutableCounterLong) ­​ Number of failures
> during operations against the consistent store implementation.
> ● S3GuardConsistencyStoreTimeouts (MutableCounterLong) ­​ Number of timeouts
> during operations against the consistent store implementation.
> ● S3GuardInconsistencies (MutableCounterLong) ­ C​ ount of times S3Guard failed
to
> achieve consistency, even after exhausting all rechecks. A high count may indicate
> unexpected out­of­band modification of the S3 bucket contents, such as by an external
> tool that does not make corresponding updates to the consistent store.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message