hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14973) [s3a] Log StorageStatistics
Date Thu, 26 Oct 2017 17:38:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220862#comment-16220862

Aaron Fabbri commented on HADOOP-14973:

I've seen the toString() stuff, but it doesn't give us a way of logging periodic stats without
requiring customers to change job code, right?

I'd like a configurable way to get periodic statistics logged while we work on getting this
stuff plumbed through the major compute engines.  It would be nice to have something to looks
at when someone wants to know why their job is going slow, without requiring job changes.
Too much spam in logs is a concern though.  [~stevel@apache.org] what do you think about having
a config knob like {{fs.s3a.statistics.log.seconds}} or {{fs.s3a.statistics.log.on.close}}?

 Note also that aggregation by sum no longer works for many of these metrics (obvious but
worth mentioning).

> [s3a] Log StorageStatistics
> ---------------------------
>                 Key: HADOOP-14973
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14973
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1, 2.8.1
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
> S3A is currently storing much more detailed metrics via StorageStatistics than are logged
in a MapReduce job. Eventually, it would be nice to get Spark, MapReduce and other workloads
to retrieve and store these metrics, but it may be some time before they all do that. I'd
like to consider having S3A publish the metrics itself in some form. This is tricky, as S3A
has no daemon but lives inside various other processes.
> Perhaps writing to a log file at some configurable interval and on close() would be the
best we could do. Other ideas would be welcome.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message