hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13203) S3a: Consider reducing the number of connection aborts by setting correct length in s3 request
Date Fri, 17 Jun 2016 18:32:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336641#comment-15336641

Steve Loughran commented on HADOOP-13203:

 patch 005. This is a WiP, just wanted to push it up to show where I'm going here.

The key change is that it introduces the notion of an InputStrategy to S3a, currently: general,
positioned, sequential

As of now, there's also no diff between positioned and general: they both say "to end of stream";
I think general may want to consider having slightly shorter range, though still
something big.

Logic of seekInStream enhanced to not try seeking if the end of the range passed in is beyond
the end of the current read.

The metrics track more details on range overshoot

-now need to test both codepaths. The strategy can be set on an instantiated FS instance to
allow testing without recreating FS instances.
-still wasteful of data in the current read if the next read overshoots (maybe counter could
track the missed quantitiy there), then go to having read(bytes[]) return the amount of available
data, with the readFully() calls handling the incomplete response by asking for more.
-what would a good policy for "general" be? Not positioned, clearly...but is sequential it?

> S3a: Consider reducing the number of connection aborts by setting correct length in s3
> ----------------------------------------------------------------------------------------------
>                 Key: HADOOP-13203
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13203
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HADOOP-13203-branch-2-001.patch, HADOOP-13203-branch-2-002.patch,
HADOOP-13203-branch-2-003.patch, HADOOP-13203-branch-2-004.patch, HADOOP-13203-branch-2-005.patch,
> Currently file's "contentLength" is set as the "requestedStreamLen", when invoking S3AInputStream::reopen().
 As a part of lazySeek(), sometimes the stream had to be closed and reopened. But lots of
times the stream was closed with abort() causing the internal http connection to be unusable.
This incurs lots of connection establishment cost in some jobs.  It would be good to set the
correct value for the stream length to avoid connection aborts. 
> I will post the patch once aws tests passes in my machine.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message