apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2174) S3 File Reader reading more data than expected
Date Wed, 03 Aug 2016 10:16:20 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405674#comment-15405674

ASF GitHub Bot commented on APEXMALHAR-2174:

GitHub user chaithu14 opened a pull request:


    APEXMALHAR-2174-S3-ReaderIssue Fixed the S3 reader issue


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chaithu14/incubator-apex-malhar APEXMALHAR-2174-S3-ReaderIssue

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #360
commit 01b0f42d1d0ab2e6030e390e10e1dafba72f3302
Author: Chaitanya <chaitanya@datatorrent.com>
Date:   2016-08-03T10:13:30Z

    APEXMALHAR-2174-S3-ReaderIssue Fixed the S3 reader issue


> S3 File Reader reading more data than expected
> ----------------------------------------------
>                 Key: APEXMALHAR-2174
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2174
>             Project: Apache Apex Malhar
>          Issue Type: Bug
>            Reporter: Chaitanya
>            Assignee: Chaitanya
> This is observed through the AWS billing.
> Issue might be the S3InputStream.read() which is used in readEntity().
> Reading the block can be achieved through the AmazonS3 api's. So, I am proposing the
following solution:
> ```
>       GetObjectRequest rangeObjectRequest = new GetObjectRequest(
>           bucketName, key);
>       rangeObjectRequest.setRange(startByte, noOfBytes);
>       S3Object objectPortion = s3Client.getObject(rangeObjectRequest);
>       S3ObjectInputStream wrappedStream = objectPortion.getObjectContent();
>       byte[] record = ByteStreams.toByteArray(wrappedStream);
> Advantages of this solution: Parallel read will work for all types of s3 file systems.

This message was sent by Atlassian JIRA

View raw message