hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Bieniosek <mich...@powerset.com>
Subject Re: hadoop client
Date Tue, 11 Sep 2007 20:54:18 GMT
There is a java hadoop client you can run with

$HADOOP_HOME/bin/hadoop [--config /path/to/config/dir] fs -help

Supposedly there are also webdav and fuse HDFS implementations, but I don't
know anything about them.


On 9/11/07 1:11 PM, "Earney, Billy C." <earney@umsystem.edu> wrote:

> Greetings!
> I've been reading through the documentation, and there is one piece of
> information I'm not finding (or I missed..).  Lets say you have a
> cluster of machines one being the namenode, and the rest serving as
> datanodes.  Does a client process (a process trying to
> insert/delete/read files) need to be running on the namenode or
> datanodes? (or can it run on another machine?).
> If a client process can run on another machine, can someone give an
> example and the configuration to do such a thing?  I've seen there has
> been some work done on webdav with hadoop, and was wondering if a
> machine not part of cluster could access HDFS with something like webdav
> (or similar tool)?
> Thanks!
> -----Original Message-----
> From: Tom White [mailto:tom.e.white@gmail.com]
> Sent: Tuesday, September 11, 2007 2:16 PM
> To: hadoop-user@lucene.apache.org
> Subject: Re: Accessing S3 with Hadoop?
>> I just updated the page to add a Notes section explaining the issue
>> and referencing the JIRA issue # you mentioned earlier.
> Great - thanks.
>>> Are you able to do 'bin/hadoop-ec2 launch-cluster' then (on your
> workstation)
>>> . bin/hadoop-ec2-env.sh
>>> ssh $SSH_OPTS "root@$MASTER_HOST" "sed -i -e
>>> \"s/$MASTER_HOST/\$(hostname)/g\"
>>> /usr/local/hadoop-$HADOOP_VERSION/conf/hadoop-site.xml"
>>> and then check to see if the master host has been set correctly (to
>>> the internal IP) in the master host's hadoop-site.xml.
>> Well, no, since my $MASTER_HOST is now just the external DNS name of
>> the first instance started in the reservation, but this is performed
>> as part of my launch-hadoop-cluster script. In any case, that value is
>> not set to the internal IP, but rather to the hostname portion of the
>> internal DNS name.
> This is a bit of a mystery to me - I'll try to reproduce it in on my
> workstation.
>> Currently, my MR jobs are failing because the reducers can't copy the
>> map output and I'm thinking it might be because there is some kind of
>> external address getting in there somehow. I see connections to
>> external IPs in netstat -tan (72.* addresses). Any ideas about that?
>> In the hadoop-site.xml's on the slaves, the address is the external
>> DNS name of the master (ec2-*) but that resolves to the internal 10/8
>> address like it should.
>>> Also, what version of the EC2 tools are you using?
>> black:~/code/hadoop-0.14.0/src/contrib/ec2> ec2-version
>> 1.2-11797 2007-03-01
>> black:~/code/hadoop-0.14.0/src/contrib/ec2>
> I'm using the same version so that's not it.
>>> Instances are terminated on the basis of their AMI ID since 0.14.0.
>>> See https://issues.apache.org/jira/browse/HADOOP-1504.
>> I felt this was unsafe as it was, since it looked for a name of an
>> image and then reversed it to the AMI ID. I just hacked it so you have
>> to put in the AMI ID in hadoop-ec2-env.sh. Also, the script as it is
>> right now doesn't grep for 'running' so may potentially shut down some
>> instances starting up in another cluster. I may just be paranoid,
>> however ;)
> Checking for 'running' is a good idea. I've relied on version number
> so folks can easily select the version of hadoop they want on the
> cluster. Perhaps the best solution would be to allow an optional
> parameter to the terminate script to specify the AMI ID if you need
> extra certainty (the script already prompts with a list of instances
> to terminate).
> Tom

View raw message