hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13203) S3a: Consider reducing the number of connection aborts by setting correct length in s3 request
Date Thu, 02 Jun 2016 22:53:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313239#comment-15313239
] 

Chris Nauroth commented on HADOOP-13203:
----------------------------------------

Rajesh, thank you for the patch.  I have to apologize.  I think this might be a regression
that traces back to code review feedback I gave Steve on HADOOP-13028:

https://issues.apache.org/jira/browse/HADOOP-13028?focusedCommentId=15267729&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15267729

My thinking during the HADOOP-13028 patch was that we might want to keep on reading through
the same stream, regardless of the limit of the "current" read call, so we might as well request
the whole content in the HTTP request.  I was attempting to optimize away extraneous additional
HTTP calls.  It appears there was an intended side effect.

I want to make sure I understand the problem here fully.  Right now, I don't think I understand
why the aborts were happening.  Is it because requesting the full content, in combination
with Hive's random seek workloads, left the underlying HTTP connection untouched and idle
for a long time?  Then, after a while, the HTTP connection was deemed inactive/not fully consumed,
it assumed there was some kind of client error, and then the whole TCP connection was shut
down?

It's nice to see a comment on the {{requestedStreamLen}} calculation.  Thank you for adding
that.  I might ask for some further details to be added to that comment, after I feel like
I have a full understanding of the issue.

> S3a: Consider reducing the number of connection aborts by setting correct length in s3
request
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13203
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13203
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HADOOP-13203-branch-2-001.patch, HADOOP-13203-branch-2-002.patch
>
>
> Currently file's "contentLength" is set as the "requestedStreamLen", when invoking S3AInputStream::reopen().
 As a part of lazySeek(), sometimes the stream had to be closed and reopened. But lots of
times the stream was closed with abort() causing the internal http connection to be unusable.
This incurs lots of connection establishment cost in some jobs.  It would be good to set the
correct value for the stream length to avoid connection aborts. 
> I will post the patch once aws tests passes in my machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message