hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests
Date Tue, 10 May 2016 14:07:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278142#comment-15278142
] 

Steve Loughran commented on HADOOP-13028:
-----------------------------------------

# I'll comment in the close().
# we should add a compatibility statement to string values "no guarantees at all". There's
one for token printing, HDFS-9732, where we've explicitly added a stable string value, {{toStringStable()}}
so that a CLI command gets the same output as before —but that was for the specific case
"output of a command line tool". Maybe we should standardise that method with an interface
and a guarantee "this method doesn't change, provided libraries and the JDK doesn't change
its output underneath"
# as it stands, it's useful today as I've been looking at the printed logs in test runs downstream;
no attempt to parse in software. Where it's invaluable here is: that downstream code doesn't
need to be built exclusively against Hadoop 2.8+, or get access to an API we've agreed to
hide. For example: SPARK-7481.

I absolutely need that printing in there, otherwise the value of this patch is significantly
reduced. If you want me to add a line like "WARNING: UNSTABLE" or something to that string
value, I'm happy to do so. Or the output is published in a way that is deliberately hard to
parse by machine but which we humans can read. But without that information, we can't so easily
tell which

If you do insist on that string being pulled, then I'm going to convert the statistics to
being a globally accessible object instead, albeit tagged as @Unstable and LimitedPrivate("Testing").


> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, HADOOP-13028-004.patch,
HADOOP-13028-005.patch, HADOOP-13028-006.patch, HADOOP-13028-007.patch, HADOOP-13028-008.patch,
HADOOP-13028-009.patch, HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch,
HADOOP-13028-branch-2-010.patch, org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, closing connections
may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of open/close/failure+reconnect
operations, timers of how long things take. This can be used downstream to measure efficiency
of the code (how often connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message