hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kaidi Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-14710) Uber-JIRA: Support AWS Snowball
Date Tue, 11 Sep 2018 00:58:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609969#comment-16609969
] 

Kaidi Zhao edited comment on HADOOP-14710 at 9/11/18 12:57 AM:
---------------------------------------------------------------

The s3 end point exposed from Snowball or Snowball Edge is just a subset of regular s3. For
example, for Snowball Edge, see: 

[https://docs.aws.amazon.com/snowball/latest/developer-guide/using-adapter-supported-api.html] 

I was able to give it a try using using hadoop 2.7.5 as well as 2.8.4, and noticed many command
commands do not work there. 

1) I tried distcp from hadoop into Snowball Edge's s3. Looks like hadoop tries to use "PUT
Object - Copy" when moving the temporary file to the final file. But this rest api is not
supported by Snowball's S3 so it errors out. 

2) I also tried things like hadoop fs -ls s3a://xyz/, the command retries a number of times,
then errors out, saying something like: 

ls: listStatus on s3a://xyz/: com.amazonaws.AmazonClientException: Unable to execute HTTP
request: Read timed out. 

2a) Strange enough, with hadoop debug on, I can clearly see the "ListBucketResult" object
is actually returned, so I guess somehow it errors out somewhere else. 

ab) Also, if I use "s3a://xyz" instead (no back slash at the end), then the error is like: ls:
's3a://xyz': No such file or directory.

 

In short, I don't see any way we can copy data directly from hdfs to Snowball Edge's S3. 

 


was (Author: kdzhao):
The s3 end point exposed from Snowball or Snowball Edge is just a subset of regular s3. For
example, for Snowball Edge, see: 

[https://docs.aws.amazon.com/snowball/latest/developer-guide/using-adapter-supported-api.html] 

I was able to give it a try using using hadoop 2.7.5 as well as 2.8.4, and noticed many command
commands do not work there. 

1) I tried distcp from hadoop into Snowball Edge's s3. Looks like hadoop tries to use "PUT
Object - Copy" when moving the temporary file to the final file. But this rest api is not
supported by Snowball's S3 so it errors out. 

2) I also tried things like hadoop fs -ls s3a://xyz/, the command retries a number of times,
then errors out, saying something like: 

ls: listStatus on s3a://xyz/: com.amazonaws.AmazonClientException: Unable to execute HTTP
request: Read timed out. 

2a) Strange enough, with hadoop debug on, I can clearly see the "ListBucketResult" object
is actually returned, so I guess somehow it errors out somewhere else. 

ab) Also, if I use "s3a://xyz" instead (no back slash at the end), then the error is like: ls:
's3a://xyz': No such file or directory.

 

 

 

 

> Uber-JIRA: Support AWS Snowball
> -------------------------------
>
>                 Key: HADOOP-14710
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14710
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: John Zhuge
>            Assignee: John Zhuge
>            Priority: Major
>
> Support data transfer between Hadoop and [AWS Snowball|http://docs.aws.amazon.com/snowball/latest/ug/whatissnowball.html].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message