hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bjoern Schiessle <bjo...@schiessle.org>
Subject Re: Best way to write files to hdfs (from a Python app)
Date Wed, 11 Aug 2010 11:39:50 GMT
On Tue, 10 Aug 2010 09:39:17 -0700 Philip Zeyliger wrote:
> On Tue, Aug 10, 2010 at 5:06 AM, Bjoern Schiessle
> <bjoern@schiessle.org>wrote:
> > On Mon, 9 Aug 2010 16:35:07 -0700 Philip Zeyliger wrote:
> > > To give you an example of how this may be done, HUE, under the
> > > covers, pipes your data to 'bin/hadoop fs
> > > -Dhadoop.job.ugi=user,group put - path'. (That's from memory, but
> > > it's approximately right; the full python code is at
> > >
> > http://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/fs/hadoopfs.py#L692
> > > )
> >
> > Thank you! If I understand it correctly this only works if my python
> > app runs on the same server as hadoop, right?
> >
> 
> It works only if your python app has network connectivity to your
> namenode. You can access an explicitly specified HDFS by passing
> -Dfs.default.name=hdfs://<namenode>:<namenode_port>/
> .  (The default is read from hadoop-site.xml (or perhaps hdfs-site.xml),
> and, I think, defaults to file:///).

Thank you. This sounds really good! I tried it but i still have a problem.

The namenode is defined at hadoop/conf/core-site.xml. At the namenode it
looks like:

<property>
  <name>fs.default.name</name>
  <value>hdfs://hadoopserver:9000</value>
</property>

I have now copied the whole hadoop directory to the client where the
python app runs.

If I run "hadoop fs -ls /"
I get a message the he can't connect to the server and hadoop tries to
connect again and again:

10/08/11 12:06:34 INFO ipc.Client: Retrying connect to server: hadoopserver/129.69.216.55:9000.
Already tried 0 time(s). 10/08/11
12:06:35 INFO ipc.Client: Retrying connect to server: hadoopserver/129.69.216.55:9000. Already
tried 1 time(s).

From the client I can access the web interface of the namenode
(hadoopserver:50070). "Browse the file system" links to
http://pcmoholynagy:50070/nn_browsedfscontent.jsp but if I click at the
link I get redirected to
http://localhost:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
which of course can't be accessed by the client. If I replace "localhost"
with "hadoopserver" it works.

Maybe the wrong redirection also causes the problem if i call "bin/hadoop
fs -ls /"?

If have tried to find something by reading the documentation and by
google but I couldn't find a solution.

Any ideas?

Thanks!
Björn

Mime
View raw message