hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13047) S3a Forward seek in stream length to be configurable
Date Fri, 22 Apr 2016 13:15:13 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rajesh Balamohan updated HADOOP-13047:
    Attachment: HADOOP-13047.WIP.patch

Attaching the high level WIP patch. Based on the gathered statistics on the amount of data
read so far and the time taken to connect, it should be possible to determine whether to establish
a new connection or to read from existing stream itself (like the case you had pointed earlier).
WIP tries to address this scenario. It might not be possible to use something like ReadAheadPool
in hadoop directly as that is based on FileDescriptor.

> S3a Forward seek in stream length to be configurable
> ----------------------------------------------------
>                 Key: HADOOP-13047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13047
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>         Attachments: HADOOP-13047.WIP.patch
> Even with lazy seek, tests can show that sometimes a short-distance forward seek is triggering
a close + reopen, because the threshold for the seek is simply available bytes in the inner
> A configurable threshold would allow data to be read and discarded before that seek.
This should be beneficial over long-haul networks as the time to set up the TCP channel is
high, and TCP-slow-start means that the ramp up of bandwidth is slow. In such deployments,
it will better to read forward than re-open, though the exact "best" number will vary with
client and endpoint.

This message was sent by Atlassian JIRA

View raw message