hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Foss User <foss...@gmail.com>
Subject Newbie questions on H adoop local directories?
Date Sun, 05 Apr 2009 08:14:03 GMT
I am trying to learn Hadoop and a lot of questions come to my mind
when I try to learn it. So, I will be asking a few questions here from
time to time until I feel completely comfortable with it. Here are
some questions now:

1. Is it true that Hadoop should be installed on the same location on
all Linux machines? As per what I have understood, it is necessary to
install them on the same machine on all nodes as if I am going to use
bin/start-dfs.sh and bin/start-mapred.sh to start the data nodes and
task trackers on all slaves. Otherwise, it is not required. How
correct I am?

2. Say, a slave goes down (due to network problems or power cut) while
a word count job was going on. When it comes up again, what are the
tasks I need to do? bin/hadoop-daemon.sh start datanode and
bin/hadoop-daemon.sh start tasktracker is enough for recovery? Do, I
have to delete any /tmp/hadoop-hadoop directories before starting? Is
it guaranteed that on starting, any corrupt files in tmp directory
would be discarded and everything would be restored to normalcy?

3. Say, I have 1 master and 4 slaves and I start datanode on 2 slaves
and tasktracker on the other two. I put files in the HDFS. it means
that the files would be stored in the first two datanodes. Then I run
a word count job. This means that the word count jobs would run on the
two task trackers. How would the two task trackers now get the files
to do the word counting? In the documentations I was reading that the
jobs are run on those nodes which have the data. but in this setup,
the data nodes and job trackers are separate. So, how will the word
count job do its work?

View raw message