hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lincoln Ritter" <linc...@lincolnritter.com>
Subject Question about handling of paths
Date Wed, 09 Jul 2008 23:29:01 GMT
Greetings,

This question is inspired by the thread on the user list:
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200807.mbox/%3C5dd5b7e40807011634g43ff1351l57399fa323853f09@mail.gmail.com%3E

Basically, there seems to be a lot of trouble using s3 as
'fs.default.name'.  I'm trying (quoting for convenience):

> <property>
>   <name>fs.default.name</name>
>   <value>s3://$HDFS_BUCKET</value>
> </property>
>
> <property>
>   <name>fs.s3.awsAccessKeyId</name>
>   <value>$AWS_ACCESS_KEY_ID</value>
> </property>
>
> <property>
>   <name>fs.s3.awsSecretAccessKey</name>
>   <value>$AWS_SECRET_ACCESS_KEY</value>
> </property>
>
> on startup of the cluster with the bucket having no non-alphabetic
> characters, I get:
>
> 2008-07-01 16:10:49,171 ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.RuntimeException: Not a host:port pair: XXXXX
> 	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:121)
> 	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
> 	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
> 	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
> 	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
> 	at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>
> If I use this style of configuration:
>
> <property>
>   <name>fs.default.name</name>
>   <value>s3://$AWS_ACCESS_KEY:$AWS_SECRET_ACCESS_KEY@$HDFS_BUCKET</value>
> </property>
>
> I get (where the all-caps portions are the actual values...):
>
> 2008-07-01 19:05:17,540 ERROR org.apache.hadoop.dfs.NameNode:
> java.lang.NumberFormatException: For input string:
> "AWS_SECRET_ACCESS_KEY@HDFS_BUCKET"
> 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> 	at java.lang.Integer.parseInt(Integer.java:447)
> 	at java.lang.Integer.parseInt(Integer.java:497)
> 	at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
> 	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:121)
> 	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
> 	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
> 	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
> 	at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)

Now, I've gotten distcp to work, but can't get hadoop fired up using
s3 as it's storage medium.  I'm a neophyte when it comes to this
codebase, but a look at the implementations of distcp
(o.a.h.utils.CopyFiles) and, say, NameNode (o.a.h.dfs.NameNode) seems
to indicate that the paths are handled very differently.  Specifically
(and I'm a bit out of my depth here), it looks like
NameNode#initialize gets passed a String version of the authority
portion of the 'fs.default.name' URI and tries to create a socket
address with it.  CopyFiles.setup asks for a FileSystem for the
specified Path.  CopyFiles makes sense to me - I can see how the
FileSystem is created etc.

NameNode doesn't make sense to me - shouldn't a FilesSystem be created
from fs.default.name instead of "blindly" creating a socket?

Is this a bug or am I completely off base here?  If I'm off base, can
someone give me an explanation of what I'm missing or point me in the
right direction?  If this seems like a "bug", what suggestions do you
have for ways to address it.  I'm happy to code it up, but, like I
say, I'm new here ;-) .

Any help is appreciated.

-lincoln

--
lincolnritter.com

Mime
View raw message