hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14423) s3guard will set file length to -1 on a putObjectDirect(stream, -1) call
Date Wed, 17 May 2017 12:30:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013939#comment-16013939
] 

Steve Loughran commented on HADOOP-14423:
-----------------------------------------

You don't get the upload length from the PutObjectResult; the content-length it returns is
the length of the response. You *may* get it through the progress callbacks.

Options
# don't allow -1 as a length in a PUT.
# if a PUT passes in a stream and -1 length: do a GET afterwards to assess its length. Expensive
and if overwriting an existing object, not guaranteed to be correct.
# use progress callbacks. This should be a consistent path for all uploads

I'm going with option one. The only two places in which a PUT is initiated this way are: PUT
at the end of a block write in BlockOutputStream; local file upload in s3guard committer.
Both codepaths know the length


> s3guard will set file length to -1 on a putObjectDirect(stream, -1) call
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-14423
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14423
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Steve Loughran
>            Priority: Minor
>
> You can pass a negative number into {{S3AFileSystem.putObjectDirect}}, which means "put
until the end of the stream". S3guard has been using this {{len}} argument: it needs to be
using the actual number of bytes uploaded. Also relevant with client side encryption, when
the amount of data put > the amount of data in the file or stream.
> Noted in the committer branch after I added some more assertions, I've changed it there
so making changes to S3AFS.putObjectDirect to pull the content length to pass in to finishedWrite()
from the {{PutObjectResult}} instead. This can be picked into the s3guard branch



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message