hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yann Landrin-Schweitzer (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-12020) Support AWS S3 reduced redundancy storage class
Date Fri, 22 May 2015 19:56:17 GMT
Yann Landrin-Schweitzer created HADOOP-12020:

             Summary: Support AWS S3 reduced redundancy storage class
                 Key: HADOOP-12020
                 URL: https://issues.apache.org/jira/browse/HADOOP-12020
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs/s3
    Affects Versions: 2.7.0
         Environment: Hadoop on AWS
            Reporter: Yann Landrin-Schweitzer

Amazon S3 uses, by default, the NORMAL_STORAGE class for s3 objects.
This offers, according to Amazon's material, 99.99999999% reliability.
For many applications, however, the 99.99% reliability offered by the REDUCED_REDUNDANCY storage
class is amply sufficient, and comes with a significant cost saving.

HDFS, when using the legacy s3n protocol, or the new s3a scheme, should support overriding
the default storage class of created s3 objects so that users can take advantage of this cost

This would require minor changes of the s3n and s3a drivers, using 
a configuration property fs.s3n.storage.class to override the default storage when desirable.

This override could be implemented in Jets3tNativeFileSystemStore with:
      S3Object object = new S3Object(key);
      if(storageClass!=null)  object.setStorageClass(storageClass);

It would take a more complex form in s3a, e.g. setting:
    InitiateMultipartUploadRequest initiateMPURequest =
        new InitiateMultipartUploadRequest(bucket, key, om);
    if(storageClass !=null ) {
        initiateMPURequest = initiateMPURequest.withStorageClass(storageClass);
and similar statements in various places.

This message was sent by Atlassian JIRA

View raw message