hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13453) S3Guard: Instrument new functionality with Hadoop metrics.
Date Mon, 23 Jan 2017 12:13:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834359#comment-15834359
] 

Steve Loughran commented on HADOOP-13453:
-----------------------------------------

Hi, don't worry about asking questions, we'll do our best to get you contributing code —it
benefits all of us if you are adding code to Hadoop.

The split between low level increment named counter and more elegant "event with internal
counters?". The event ones are cleaner, as they stop the rest of the code having to know exactly
which counters/gauges to use. Consider the elegant ones the best approach, and the direct
invocation us being lazy.

The S3aInstrumentation class also has a set of explicit named counters "filesDeleted" as well
as lots of ones that are only listed in the arrays {{GAUGES_TO_CREATE}} and {{COUNTERS_TO_CREATE}}.
That's evolution over time; I got bored of having to name and register lots of fields, and
realised I could do it from the arrays, at the cost of a hash lookup on every increment.

Outside the S3a class itself, i've tried to have external inner classes to do the counting,
with the results merged in at the end (example: the input and output streams), with the inner
classes using simple long values, rather than atomics. Why? Eliminates any delays during increments,
and lets us override the toString() values for input/output streams with dumps of the values
(go on, try it!). We can have many input/output streams per FS instance, so the risk of contention
for atomic int/log values is potentially quite high.

I think for s3guard we could add a new inner class passed in to each s3guard instance; it
would export the various methods for events that s3guard could raise, such as {{tableCreated()}},
{{tableDeleted()}} —these can directly increment the atomic counters in the instrumentation,
as we'd only have a 1:1 map of S3aFS instance and a s3guard store instance.

Regarding access the statistics, that's hooked up to {{FileSystem.getStorageStatistics()}},
which is intended to provide the storage stats for any FS; s3a and HDFS share common statistic
names for the common statistics. The latest versions of Tez do collect the statistics of jobs,
and so give you the aggregate statistics across your entire query. Until now, only {{Filesystem.getStatistics()}}
has been used, which returns a fixed set of values (bytes read/written, etc). Spark still
only collects those; it'd take some migration to hadoop 2.8+ to pick up the new data. Until
then, it's something we can use in tests.



> S3Guard: Instrument new functionality with Hadoop metrics.
> ----------------------------------------------------------
>
>                 Key: HADOOP-13453
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13453
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Ai Deng
>
> Provide Hadoop metrics showing operational details of the S3Guard implementation.
> The metrics will be implemented in this ticket:
> ● S3GuardRechecksNthPercentileLatency (MutableQuantiles) ­​ Percentile time spent
> in rechecks attempting to achieve consistency. Repeated for multiple percentile values
> of N.  This metric is an indicator of the additional latency cost of running
S3A with
> S3Guard.
> ● S3GuardRechecksNumOps (MutableQuantiles) ­​ Number of times a consistency
> recheck was required while attempting to achieve consistency.
> ● S3GuardStoreNthPercentileLatency (MutableQuantiles) ­​ Percentile time spent in
> operations against the consistent store, including both write operations during file
system
> mutations and read operations during file system consistency checks. Repeated for
> multiple percentile values of N. This metric is an indicator of latency to the
consistent
> store implementation.
> ● S3GuardConsistencyStoreNumOps (MutableQuantiles) ­​ Number of operations
> against the consistent store, including both write operations during file system mutations
> and read operations during file system consistency checks.
> ● S3GuardConsistencyStoreFailures (MutableCounterLong) ­​ Number of failures
> during operations against the consistent store implementation.
> ● S3GuardConsistencyStoreTimeouts (MutableCounterLong) ­​ Number of timeouts
> during operations against the consistent store implementation.
> ● S3GuardInconsistencies (MutableCounterLong) ­ C​ ount of times S3Guard failed
to
> achieve consistency, even after exhausting all rechecks. A high count may indicate
> unexpected out­of­band modification of the S3 bucket contents, such as by an external
> tool that does not make corresponding updates to the consistent store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message