hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart Sierra" <m...@stuartsierra.com>
Subject Re: Namenode Exceptions with S3
Date Wed, 09 Jul 2008 19:27:42 GMT
I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').

With distcp, I found that using the URL format s3://ID:SECRET@BUCKET/
did not work, even if I encoded the slash as "%2F".  I got
"org.jets3t.service.S3ServiceException: S3 HEAD request failed.
ResponseCode=403, ResponseMessage=Forbidden"

When I put the AWS Secret Key in hadoop-site.xml and wrote the URL as
s3://BUCKET/ it worked.

I have periods ('.') in my bucket name, that was not a problem.

What's weird is that org.apache.hadoop.fs.s3.Jets3tFileSystemStore
uses java.net.URI, which should take take of unencoding the %2F.

-Stuart


On Wed, Jul 9, 2008 at 1:41 PM, Lincoln Ritter
<lincoln@lincolnritter.com> wrote:
> So far, I've had no luck.
>
> Can anyone out there clarify the permissible characters/format for aws
> keys and bucket names?
>
> I haven't looked at the code here, but it seems strange to me that the
> same restrictions on host/port etc apply given that it's a totally
> different system.  I'd love to see exceptions thrown that are
> particular to the protocol/subsystem being employed.  The s3 'handler'
> (or whatever, might be nice enough to check for format violations and
> throw and appropriate exception, for instance.  It might URL-encode
> the secret key so that the user doesn't have to worry about this, or
> throw an exception notifying the user of a bad format.  Currently,
> apparent problems with my s3 settings are throwing exceptions that
> give no indication that the problem is actually with those settings.
>
> My mitigating strategy has been to change my configuration to use
> "instance-local" storage (/mnt).  I then copy the results out to s3
> using 'distcp'.  This is odd since distcp seems ok with my s3/aws
> info.
>
> I'm still unclear as to the permissible characters in bucket names and
> access keys.  I gather '/' is bad in the secret key and that '_' is
> bad for bucket names.  Thusfar i have only been able to get buckets to
> work in distcp that have only letters in their names, but I haven't
> tested to extensively.
>
> For example, I'd love to use buckets like:
> 'com.organization.hdfs.purpose'.  This seems to fail.  Using
> 'comorganizationhdfspurpose' works but clearly that is less than
> optimal.
>
> Like I say, I haven't dug into the source yet, but it is curious that
> distcp seems to work (at least where s3 is the destination) and hadoop
> fails when s3 is used as its storage.
>
> Anyone who has dealt with these issues, please post!  It will help
> make the project better.
>
> -lincoln
>
> --
> lincolnritter.com
>
>
>
> On Wed, Jul 9, 2008 at 7:10 AM, slitz <slitzferrari@gmail.com> wrote:
>> I'm having the exact same problem, any tip?
>>
>> slitz
>>
>> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter <lincoln@lincolnritter.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
>>> configuration:
>>>
>>> <property>
>>>  <name>fs.default.name</name>
>>>  <value>s3://$HDFS_BUCKET</value>
>>> </property>
>>>
>>> <property>
>>>  <name>fs.s3.awsAccessKeyId</name>
>>>  <value>$AWS_ACCESS_KEY_ID</value>
>>> </property>
>>>
>>> <property>
>>>  <name>fs.s3.awsSecretAccessKey</name>
>>>  <value>$AWS_SECRET_ACCESS_KEY</value>
>>> </property>
>>>
>>> on startup of the cluster with the bucket having no non-alphabetic
>>> characters, I get:
>>>
>>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>>        at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>
>>> If I use this style of configuration:
>>>
>>> <property>
>>>  <name>fs.default.name</name>
>>>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
>>> </property>
>>>
>>> I get (where the all-caps portions are the actual values...):
>>>
>>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>>> java.lang.NumberFormatException: For input string:
>>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>>        at
>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>        at java.lang.Integer.parseInt(Integer.java:447)
>>>        at java.lang.Integer.parseInt(Integer.java:497)
>>>        at
>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>
>>> These exceptions are taken from the namenode log.  The datanode logs
>>> show the same exceptions.
>>>
>>> Other than the above configuration changes, the configuration is
>>> identical to that generate by the hadoop image creation script found
>>> in the 0.17.0 distribution.
>>>
>>> Can anybody point me in the right direction here?
>>>
>>> -lincoln
>>>
>>> --
>>> lincolnritter.com
>>>
>>
>

Mime
View raw message