hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests
Date Mon, 02 May 2016 23:31:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267729#comment-15267729
] 

Chris Nauroth commented on HADOOP-13028:
----------------------------------------

[~stevel@apache.org], I've spent more time reading the seek code changes, and I'm pretty confident
that they're correct overall, but I have a few more comments.

# {{S3AInputStream#closeStream}} has the following log message.  The text of the message indicates
that it's logging {{contentLength}}, but really it's logging {{length}}.  I imagine {{length}}
is really the more interesting thing here, and the message text should be changed?
{code}
      LOG.debug("Stream {} {}: {}; streamPos={}, nextReadPos={}," +
          " contentLength={}",
          uri, (shouldAbort ? "aborted":"closed"), reason, pos, nextReadPos,
          length);
{code}
# Actually, that makes me realize I am unclear about a change made in HADOOP-12444.  {{S3AInputStream#reopen}}
has a stream length calculation that gets passed into the range request.
{code}
    requestedStreamLen = (length < 0) ? this.contentLength :
        Math.max(this.contentLength, (CLOSE_THRESHOLD + (targetPos + length)));
    ...
    GetObjectRequest request = new GetObjectRequest(bucket, key)
        .withRange(targetPos, requestedStreamLen);
{code}
Please tell me if I'm misunderstanding something, but I believe this calculation always results
in an upper bound on the range that effectively means "get the whole thing."  That {{Math.max}}
call guarantees that the value is always at least {{contentLength}}, which is the whole file
length.  Is this a bug in the HADOOP-12444 patch?
# {{InputStreamStatistics#seekBackwards}} accepts {{offset}} as an argument but doesn't use
it.  Is there supposed to be another counter for back-skipped bytes?  At the call site within
{{S3AInputStream#seekInStream}}, the value it passes would be negative, so we'd need to be
careful of that.


> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, HADOOP-13028-004.patch,
HADOOP-13028-005.patch, HADOOP-13028-006.patch, HADOOP-13028-007.patch, HADOOP-13028-008.patch,
HADOOP-13028-branch-2-008.patch, org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, closing connections
may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of open/close/failure+reconnect
operations, timers of how long things take. This can be used downstream to measure efficiency
of the code (how often connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message