hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lincoln Ritter" <linc...@lincolnritter.com>
Subject Re: Namenode Exceptions with S3
Date Wed, 09 Jul 2008 17:41:54 GMT
So far, I've had no luck.

Can anyone out there clarify the permissible characters/format for aws
keys and bucket names?

I haven't looked at the code here, but it seems strange to me that the
same restrictions on host/port etc apply given that it's a totally
different system.  I'd love to see exceptions thrown that are
particular to the protocol/subsystem being employed.  The s3 'handler'
(or whatever, might be nice enough to check for format violations and
throw and appropriate exception, for instance.  It might URL-encode
the secret key so that the user doesn't have to worry about this, or
throw an exception notifying the user of a bad format.  Currently,
apparent problems with my s3 settings are throwing exceptions that
give no indication that the problem is actually with those settings.

My mitigating strategy has been to change my configuration to use
"instance-local" storage (/mnt).  I then copy the results out to s3
using 'distcp'.  This is odd since distcp seems ok with my s3/aws
info.

I'm still unclear as to the permissible characters in bucket names and
access keys.  I gather '/' is bad in the secret key and that '_' is
bad for bucket names.  Thusfar i have only been able to get buckets to
work in distcp that have only letters in their names, but I haven't
tested to extensively.

For example, I'd love to use buckets like:
'com.organization.hdfs.purpose'.  This seems to fail.  Using
'comorganizationhdfspurpose' works but clearly that is less than
optimal.

Like I say, I haven't dug into the source yet, but it is curious that
distcp seems to work (at least where s3 is the destination) and hadoop
fails when s3 is used as its storage.

Anyone who has dealt with these issues, please post!  It will help
make the project better.

-lincoln

--
lincolnritter.com



On Wed, Jul 9, 2008 at 7:10 AM, slitz <slitzferrari@gmail.com> wrote:
> I'm having the exact same problem, any tip?
>
> slitz
>
> On Wed, Jul 2, 2008 at 12:34 AM, Lincoln Ritter <lincoln@lincolnritter.com>
> wrote:
>
>> Hello,
>>
>> I am trying to use S3 with Hadoop 0.17.0 on EC2.  Using this style of
>> configuration:
>>
>> <property>
>>  <name>fs.default.name</name>
>>  <value>s3://$HDFS_BUCKET</value>
>> </property>
>>
>> <property>
>>  <name>fs.s3.awsAccessKeyId</name>
>>  <value>$AWS_ACCESS_KEY_ID</value>
>> </property>
>>
>> <property>
>>  <name>fs.s3.awsSecretAccessKey</name>
>>  <value>$AWS_SECRET_ACCESS_KEY</value>
>> </property>
>>
>> on startup of the cluster with the bucket having no non-alphabetic
>> characters, I get:
>>
>> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
>> java.lang.RuntimeException: Not a host:port pair: XXXXX
>>        at
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>
>> If I use this style of configuration:
>>
>> <property>
>>  <name>fs.default.name</name>
>>  <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
>> </property>
>>
>> I get (where the all-caps portions are the actual values...):
>>
>> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
>> java.lang.NumberFormatException: For input string:
>> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
>>        at
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>        at java.lang.Integer.parseInt(Integer.java:447)
>>        at java.lang.Integer.parseInt(Integer.java:497)
>>        at
>> org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
>>        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
>>        at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>>
>> These exceptions are taken from the namenode log.  The datanode logs
>> show the same exceptions.
>>
>> Other than the above configuration changes, the configuration is
>> identical to that generate by the hadoop image creation script found
>> in the 0.17.0 distribution.
>>
>> Can anybody point me in the right direction here?
>>
>> -lincoln
>>
>> --
>> lincolnritter.com
>>
>

Mime
View raw message