hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Payne" <je...@eyealike.com>
Subject Re: basic questions about Hadoop!
Date Thu, 28 Aug 2008 23:04:31 GMT
Gerardo:

I can't really speak to all of your questions, but the master/slave issue is
a common concern with hadoop.  A cluster has a single namenode and therefore
a single point of failure.  There is also a secondary name node process
which runs on the same machine as the name node in most default
configurations.  You can make it a different machine by adjusting the master
file.  One of the more experienced lurkers should feel free to correct me,
but my understanding is that the secondary name node keeps track of all the
same index information used by the primary name node.  So, if the namenode
fails, there is no automatic recovery, but you can always tweak your cluster
configuration to make the secondary namenode the primary and safely restart
the cluster.

As for the storage of files, the name node is really just the traffic cop
for HDFS.  No HDFS files are actually stored on that machine.  It's
basically used as a directory and lock manager, etc.  The files are stored
on multiple datanodes and I'm pretty sure all the actual file I/O happens
directly between the client and the respective datanodes.

Perhaps one of the more hardcore hadoop people on here will point it out if
I'm giving bad advice.


On Thu, Aug 28, 2008 at 2:28 PM, Gerardo Velez <jgerardo.velez@gmail.com>wrote:

> Hi Everybody!
>
> I'm a newbie with Hadoop, I've installed it as a single node as a
> pseudo-distributed environment, but I would like to go further and
> configure
> a complete hadoop cluster. But I got the following questions.
>
> 1.- I undertsand that HDFS has a master/slave architecture. So master and
> the master server manages the file system namespace and regulates access to
> files by clients. So, what happens in a cluster environment if the master
> server fails or is down due to network issues?
> the slave become as master server or something?
>
>
> 2.- What about Haddop Filesystem, from client point of view. the client
> should only store files in the HDFS on master server, or clients are able
> to
> store the file to be processed on a HDFS from a slave server as well?
>
>
> 3.- Until now, what I;m doing to run hadoop is:
>
>    1.- copy file to be processes from Linux File System to HDFS
>    2.- Run hadoop shell   hadoop   -jarfile  input output
>    3.- The results are stored on output directory
>
>
> There is anyway to have hadoop as a deamon, so that, when the file is
> stored
> in HDFS the file is processed automatically with hadoop?
>
> (witout to run hadoop shell everytime)
>
>
> 4.- What happens with processed files, they are deleted form HDFS
> automatically?
>
>
> Thanks in advance!
>
>
> -- Gerardo Velez
>



-- 
Jeffrey Payne
Lead Software Engineer
Eyealike, Inc.
jeffp@eyealike.com
www.eyealike.com
(206) 257-8708


"Anything worth doing is worth overdoing."
-H. Lifter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message