hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11570) S3AInputStream.close() downloads the remaining bytes of the object from S3
Date Wed, 11 Feb 2015 16:02:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316416#comment-14316416

Dan Hecht commented on HADOOP-11570:

Correct, the seek case already uses abort().  Additionally, the S3ObjectInputStream.abort()
documentation makes it clear that this is the expected tradeoff between abort() and close():

     * {@inheritDoc}
     * Aborts the underlying http request without reading any more data and
     * closes the stream.
     * <p>
     * By default Apache {@link HttpClient} tries to reuse http connections by
     * reading to the end of an attached input stream on
     * {@link InputStream#close()}. This is efficient from a socket pool
     * management perspective, but for objects with large payloads can incur
     * significant overhead while bytes are read from s3 and discarded. It's up
     * to clients to decide when to take the performance hit implicit in not
     * reusing an http connection in order to not read unnecessary information
     * from S3.
     * @see EofSensorInputStream

> S3AInputStream.close() downloads the remaining bytes of the object from S3
> --------------------------------------------------------------------------
>                 Key: HADOOP-11570
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11570
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Dan Hecht
>         Attachments: HADOOP-11570-001.patch
> Currently, S3AInputStream.close() calls S3Object.close().  But, S3Object.close() will
read the remaining bytes of the S3 object, potentially transferring a lot of bytes from S3
that are discarded.  Instead, the wrapped stream should be aborted to avoid transferring discarded
bytes (unless the preceding read() finished at contentLength).  For example, reading only
the first byte of a 1 GB object and then closing the stream will result in all 1 GB transferred
from S3.

This message was sent by Atlassian JIRA

View raw message