hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13203) S3a: Consider reducing the number of connection aborts by setting correct length in s3 request
Date Tue, 21 Jun 2016 15:57:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Loughran updated HADOOP-13203:
------------------------------------
    Attachment: HADOOP-13203-branch-2-008.patch

Patch 008; tested against s3 ireland.

This revision has the test to demonstrate what I suspected: reads spanning block boundaries
were going to have problems —and it has the fix. Which consists of always calling {{seekInStream(pos,
len)}} before a read, even if {{targetPos==currentPos}} —and in that situation, closing
the current stream if the currentPos is at the end of the current request range (i.e. there's
no seek, but no data either). The test does block-spanning reads, on a file built up with
the byte at each position being {{(position % 64)}} ... this is used in the tests to verify
the bytes returned really are the bytes in the file at the specific read positions.

BTW, note that some of the -Len fields in the input stream now refer to range start and finish;
Len isn't appropriate now the range of the HTTP request may be less than the length of the
actual blob. It was getting confusing.

> S3a: Consider reducing the number of connection aborts by setting correct length in s3
request
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13203
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13203
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: HADOOP-13203-branch-2-001.patch, HADOOP-13203-branch-2-002.patch,
HADOOP-13203-branch-2-003.patch, HADOOP-13203-branch-2-004.patch, HADOOP-13203-branch-2-005.patch,
HADOOP-13203-branch-2-006.patch, HADOOP-13203-branch-2-007.patch, HADOOP-13203-branch-2-008.patch,
stream_stats.tar.gz
>
>
> Currently file's "contentLength" is set as the "requestedStreamLen", when invoking S3AInputStream::reopen().
 As a part of lazySeek(), sometimes the stream had to be closed and reopened. But lots of
times the stream was closed with abort() causing the internal http connection to be unusable.
This incurs lots of connection establishment cost in some jobs.  It would be good to set the
correct value for the stream length to avoid connection aborts. 
> I will post the patch once aws tests passes in my machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message