mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Olivier <cjolivie...@gmail.com>
Subject Re: S3 Writes using SIG4 Authentication
Date Wed, 07 Mar 2018 05:04:40 GMT
it seems strange that s3 would make such a major restriction. there’s
literally no way to incrementally write a file without knowing the size
beforehand? some sort of separate append calls, maybe?

On Tue, Mar 6, 2018 at 8:53 PM Rahul Huilgol <rahulhuilgol@gmail.com> wrote:

> Hi everyone,
>
> I have been looking at updating the authentication used by S3FileSystem in
> dmlc-core. Current code uses Signature version 2, which works only in the
> region us-east-1 now. We need to update the authentication scheme to use
> Signature version 4 (SIG4).
>
> I've submitted a PR <https://github.com/dmlc/dmlc-core/pull/378> to change
> this for Reads. But I wanted to seek out thoughts on what to do for Writes,
> as there is a potential problem.
>
> *How writes to S3 work currently:*
> Whenever s3filesystem's stream.write() is called, data is buffered. When
> the buffer is full, a request is made to S3. Since this can happen multiple
> times, multipart upload feature is used. An upload id is created when
> stream is initialized. This upload id is used till the stream is closed.
> Default buffer size is 64MB.
>
> *Problem:*
> The new SIG4 authentication scheme changes how multipart uploads work. Such
> an upload now requires that we know the total size of data to be sent (sum
> of sizes of all parts) when we create the first request itself. We need to
> pass the total size of payload as part of header. This is not possible
> given that we don't know all the write calls beforehand. For example, a
> call to save model's parameters makes 145 calls to the stream's write.
>
> *Approach?*
> Is it okay to buffer it to a local file, and then upload this file to S3 at
> the end?
> What use case do we have for writes to S3 generally? I believe we would
> want to write params after training or logs. These wouldn't be too large or
> frequent I imagine. What would you suggest?
>
> Appreciate your thoughts and suggestions.
>
> Thanks,
> Rahul Huilgol
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message