hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lincoln Ritter" <linc...@lincolnritter.com>
Subject Re: slash in AWS Secret Key, WAS Re: Namenode Exceptions with S3
Date Wed, 09 Jul 2008 19:47:54 GMT
Thanks for the reply.

I've heard the "regenerate" suggestion before, but for organizations
who show aws all over the place this is a huge pain.  I think it would
be better to come up with a more robust solution to handling aws info.

-lincoln

--
lincolnritter.com



On Wed, Jul 9, 2008 at 12:44 PM, Jimmy Lin <jimmylin@umd.edu> wrote:
> I've come across this problem before.  My simple solution was to
> regenerate new keys until I got one without a slash... ;)
>
> -Jimmy
>
>> I have Hadoop 0.17.1 and an AWS Secret Key that contains a slash ('/').
>>
>> With distcp, I found that using the URL format s3://ID:SECRET@BUCKET/
>> did not work, even if I encoded the slash as "%2F".  I got
>> "org.jets3t.service.S3ServiceException: S3 HEAD request failed.
>> ResponseCode=403, ResponseMessage=Forbidden"
>>
>> When I put the AWS Secret Key in hadoop-site.xml and wrote the URL as
>> s3://BUCKET/ it worked.
>>
>> I have periods ('.') in my bucket name, that was not a problem.
>>
>> What's weird is that org.apache.hadoop.fs.s3.Jets3tFileSystemStore
>> uses java.net.URI, which should take take of unencoding the %2F.
>>
>> -Stuart
>>
>>
>> On Wed, Jul 9, 2008 at 1:41 PM, Lincoln Ritter
>> <lincoln@lincolnritter.com> wrote:
>>> So far, I've had no luck.
>>>
>>> Can anyone out there clarify the permissible characters/format for aws
>>> keys and bucket names?
>>>
>>> I haven't looked at the code here, but it seems strange to me that the
>>> same restrictions on host/port etc apply given that it's a totally
>>> different system.  I'd love to see exceptions thrown that are
>>> particular to the protocol/subsystem being employed.  The s3 'handler'
>>> (or whatever, might be nice enough to check for format violations and
>>> throw and appropriate exception, for instance.  It might URL-encode
>>> the secret key so that the user doesn't have to worry about this, or
>>> throw an exception notifying the user of a bad format.  Currently,
>>> apparent problems with my s3 settings are throwing exceptions that
>>> give no indication that the problem is actually with those settings.
>>>
>>> My mitigating strategy has been to change my configuration to use
>>> "instance-local" storage (/mnt).  I then copy the results out to s3
>>> using 'distcp'.  This is odd since distcp seems ok with my s3/aws
>>> info.
>>>
>>> I'm still unclear as to the permissible characters in bucket names and
>>> access keys.  I gather '/' is bad in the secret key and that '_' is
>>> bad for bucket names.  Thusfar i have only been able to get buckets to
>>> work in distcp that have only letters in their names, but I haven't
>>> tested to extensively.
>>>
>>> For example, I'd love to use buckets like:
>>> 'com.organization.hdfs.purpose'.  This seems to fail.  Using
>>> 'comorganizationhdfspurpose' works but clearly that is less than
>>> optimal.
>>>
>>> Like I say, I haven't dug into the source yet, but it is curious that
>>> distcp seems to work (at least where s3 is the destination) and hadoop
>>> fails when s3 is used as its storage.
>>>
>>> Anyone who has dealt with these issues, please post!  It will help
>>> make the project better.
>>>
>>> -lincoln
>>>
>>> --
>>> lincolnritter.com
>>>
>>>
>>>
>>> On Wed, Jul 9, 2008 at 7:10 AM, slitz <slitzferrari@gmail.com> wrote:
>>>> I'm having the exact same problem, any tip?
>>>>
>>>> slitz
>>>>
>>>> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter
>>>> <lincoln@lincolnritter.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
>>>>> configuration:
>>>>>
>>>>> <property>
>>>>>  <name>fs.default.name</name>
>>>>>  <value>s3://$HDFS_BUCKET</value>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>  <name>fs.s3.awsAccessKeyId</name>
>>>>>  <value>$AWS_ACCESS_KEY_ID</value>
>>>>> </property>
>>>>>
>>>>> <property>
>>>>>  <name>fs.s3.awsSecretAccessKey</name>
>>>>>  <value>$AWS_SECRET_ACCESS_KEY</value>
>>>>> </property>
>>>>>
>>>>> on startup of the cluster with the bucket having no non-alphabetic
>>>>> characters, I get:
>>>>>
>>>>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>>>>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>>>>        at
>>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>>        at
>>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>>
>>>>> If I use this style of configuration:
>>>>>
>>>>> <property>
>>>>>  <name>fs.default.name</name>
>>>>>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
>>>>> </property>
>>>>>
>>>>> I get (where the all-caps portions are the actual values...):
>>>>>
>>>>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>>>>> java.lang.NumberFormatException: For input string:
>>>>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>>>>        at
>>>>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>>>>        at java.lang.Integer.parseInt(Integer.java:447)
>>>>>        at java.lang.Integer.parseInt(Integer.java:497)
>>>>>        at
>>>>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>>>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>>>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>>>>        at
>>>>> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>>>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>>>>
>>>>> These exceptions are taken from the namenode log.  The datanode logs
>>>>> show the same exceptions.
>>>>>
>>>>> Other than the above configuration changes, the configuration is
>>>>> identical to that generate by the hadoop image creation script found
>>>>> in the 0.17.0 distribution.
>>>>>
>>>>> Can anybody point me in the right direction here?
>>>>>
>>>>> -lincoln
>>>>>
>>>>> --
>>>>> lincolnritter.com
>>>>>
>>>>
>>>
>>
>>
>
>
>

Mime
View raw message