hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-14973) [s3a] Log StorageStatistics
Date Fri, 27 Oct 2017 20:21:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222690#comment-16222690
] 

Steve Loughran edited comment on HADOOP-14973 at 10/27/17 8:20 PM:
-------------------------------------------------------------------

If your customers want to know how to make effective use of a hadoop cluster in AWS then I
believe we can assist with performance tuning: just send them our way, we'll help :)

tricks
* you can configure S3 buckets to log accesses to another bucket
* you can use the UA settings (fs.s3a.user.agent) to declare what application/workflow is
talking to the bucket
* you can use big data analysis tools to go through the logs.



was (Author: stevel@apache.org):
If your customers want to know how to make effective use of a hadoop cluster in AWS then I
believe we can assist with performance tuning: just send them our way, we'll help :)

Tips of the professionals: 
* you can configure S3 buckets to log accesses to another bucket
* you can use the UA settings (fs.s3a.user.agent) to declare what application/workflow is
talking to the bucket
* you can use big data analysis tools to go through the logs.


> [s3a] Log StorageStatistics
> ---------------------------
>
>                 Key: HADOOP-14973
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14973
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-beta1, 2.8.1
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>
> S3A is currently storing much more detailed metrics via StorageStatistics than are logged
in a MapReduce job. Eventually, it would be nice to get Spark, MapReduce and other workloads
to retrieve and store these metrics, but it may be some time before they all do that. I'd
like to consider having S3A publish the metrics itself in some form. This is tricky, as S3A
has no daemon but lives inside various other processes.
> Perhaps writing to a log file at some configurable interval and on close() would be the
best we could do. Other ideas would be welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message