hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests
Date Fri, 29 Apr 2016 00:03:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263292#comment-15263292
] 

Chris Nauroth commented on HADOOP-13028:
----------------------------------------

Hello [~stevel@apache.org].  This looks very useful overall.

I'm a bit confused, because it seems different iterations of the patch have folded in fixes
from other JIRAs.  Can you please clarify for reviewers if we should be reviewing other patches
first?

Since the patch is touching some {{LOG.debug}} statements, would it be helpful to include
{{src}} and {{dst}} in those log message?

{{S3AFileSystem#removeKeys}} appears to have some subtle bugs.  This is not entirely related
to your patch.  The multi-delete might fail with some objects successfully deleted but others
remaining.  However, the stats only increment if the whole multi-delete succeeded.

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html#deleteObjects(com.amazonaws.services.s3.model.DeleteObjectsRequest)

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/MultiObjectDeleteException.html

Similarly, if multi-delete is disabled, then any individual delete in the loop might throw
an exception and skip the stats increments.

I'll wait for clarification on the question on pre-requisite patches before I take this for
a test run myself.

> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, HADOOP-13028-004.patch,
HADOOP-13028-005.patch, HADOOP-13028-006.patch, org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, closing connections
may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of open/close/failure+reconnect
operations, timers of how long things take. This can be used downstream to measure efficiency
of the code (how often connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message