hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13453) S3Guard: Instrument new functionality with Hadoop metrics.
Date Tue, 17 Jan 2017 11:51:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825927#comment-15825927

Steve Loughran commented on HADOOP-13453:

They're going to have to go into that file because those are the metrics published by the
S3A filesystem when deployed, returned by S3AStorageStatistics in a call to {{S3AFileSystem.getStorageStatistics(),
and printed in {{S3AFileSystem.toString()}}. We could choose whether to add the specific metrics
to every S3a FS instance; that's something to consider. Listing the values but returning 0
for all gauges and counters is the most consistent.

Don't worry about the class length: if you look at it in detail, there's two nested classes
+ support methods explicitly for output/input streams...you don't need to go there. The rest
of the code is fairly simple

# add new values to org.apache.hadoop.fs.s3a.Statistic; prefix {{s3guard_}}
# In {{S3AInstrumentation}}, add counters to the array {{COUNTERS_TO_CREATE}}; gauges to {{GAUGES_TO_CREATE}}
# Pass in an instance of the instrumentation down to S3Guard
# have the code call incrementCounter and increment/decrementGauge as appropriate
# I'd like a simple counter of {{s3guard_enabled}} and {{s3guard_authoritative}}, which will
be 0 when there's no s3guard running, 1 when the respective booleans are up. Why? Remote visibility

You make a good point, "where are the tests?". The answer is: the metrics can be used to test
the internal state of the S3 classes, therefore become implicitly tested there. 

Take a look at {{ITestS3ADirectoryPerformance}} for a key example of this: our test cases
use the counters of the various HTTP operations as the means to verify that API calls work
as expected. (note that s3guard, by reducing these, has complicated the tests)

That is, you verify the counters work by asserting that they change as you make operations
to the DFS. see: http://steveloughran.blogspot.co.uk/2016/04/distributed-testing-making-use-of.html
for more of my thinking here

bq. Sorry for the basic question, i'm really new for work on Hadoop code base.

happy to explain my reasoning. We've all started off staring at a vast amount of code that
we don't understand; there are still big bits of Hadoop that I don't go near.

> S3Guard: Instrument new functionality with Hadoop metrics.
> ----------------------------------------------------------
>                 Key: HADOOP-13453
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13453
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Ai Deng
> Provide Hadoop metrics showing operational details of the S3Guard implementation.
> The metrics will be implemented in this ticket:
> ● S3GuardRechecksNthPercentileLatency (MutableQuantiles) ­​ Percentile time spent
> in rechecks attempting to achieve consistency. Repeated for multiple percentile values
> of N.  This metric is an indicator of the additional latency cost of running
S3A with
> S3Guard.
> ● S3GuardRechecksNumOps (MutableQuantiles) ­​ Number of times a consistency
> recheck was required while attempting to achieve consistency.
> ● S3GuardStoreNthPercentileLatency (MutableQuantiles) ­​ Percentile time spent in
> operations against the consistent store, including both write operations during file
> mutations and read operations during file system consistency checks. Repeated for
> multiple percentile values of N. This metric is an indicator of latency to the
> store implementation.
> ● S3GuardConsistencyStoreNumOps (MutableQuantiles) ­​ Number of operations
> against the consistent store, including both write operations during file system mutations
> and read operations during file system consistency checks.
> ● S3GuardConsistencyStoreFailures (MutableCounterLong) ­​ Number of failures
> during operations against the consistent store implementation.
> ● S3GuardConsistencyStoreTimeouts (MutableCounterLong) ­​ Number of timeouts
> during operations against the consistent store implementation.
> ● S3GuardInconsistencies (MutableCounterLong) ­ C​ ount of times S3Guard failed
> achieve consistency, even after exhausting all rechecks. A high count may indicate
> unexpected out­of­band modification of the S3 bucket contents, such as by an external
> tool that does not make corresponding updates to the consistent store.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message