hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Payne" <je...@eyealike.com>
Subject Re: basic questions about Hadoop!
Date Fri, 29 Aug 2008 00:10:18 GMT
You can use the hadoop command line on machines that aren't hadoop servers.
If you copy the hadoop configuration from one of your master servers or data
node to the client machine and run the command line dfs tools, it will copy
the files directly to the data node.

Or, you could use one of the client libraries.  The java client, for
example, allows you to open up an output stream and start dumping bytes on
it.

On Thu, Aug 28, 2008 at 5:05 PM, Gerardo Velez <jgerardo.velez@gmail.com>wrote:

> Hi Jeff, thank you for answering!
>
> What about remote writing on HDFS, lets suppose I got an application server
> on a
> linux server A and I got a Hadoop cluster on servers B (master), C (slave),
> D (slave)
>
> What I would like is sent some files from Server A to be processed by
> hadoop. So in order to do so, what I need to do.... do I need send those
> files to master server first and then copy those to HDFS?
>
> or can I pass those files to any slave server?
>
> basically I'm looking for remote writing due to files to be process are not
> being generated on any haddop server.
>
> Thanks again!
>
> -- Gerardo
>
>
>
> Regarding
>
> On Thu, Aug 28, 2008 at 4:04 PM, Jeff Payne <jeffp@eyealike.com> wrote:
>
> > Gerardo:
> >
> > I can't really speak to all of your questions, but the master/slave issue
> > is
> > a common concern with hadoop.  A cluster has a single namenode and
> > therefore
> > a single point of failure.  There is also a secondary name node process
> > which runs on the same machine as the name node in most default
> > configurations.  You can make it a different machine by adjusting the
> > master
> > file.  One of the more experienced lurkers should feel free to correct
> me,
> > but my understanding is that the secondary name node keeps track of all
> the
> > same index information used by the primary name node.  So, if the
> namenode
> > fails, there is no automatic recovery, but you can always tweak your
> > cluster
> > configuration to make the secondary namenode the primary and safely
> restart
> > the cluster.
> >
> > As for the storage of files, the name node is really just the traffic cop
> > for HDFS.  No HDFS files are actually stored on that machine.  It's
> > basically used as a directory and lock manager, etc.  The files are
> stored
> > on multiple datanodes and I'm pretty sure all the actual file I/O happens
> > directly between the client and the respective datanodes.
> >
> > Perhaps one of the more hardcore hadoop people on here will point it out
> if
> > I'm giving bad advice.
> >
> >
> > On Thu, Aug 28, 2008 at 2:28 PM, Gerardo Velez <jgerardo.velez@gmail.com
> > >wrote:
> >
> > > Hi Everybody!
> > >
> > > I'm a newbie with Hadoop, I've installed it as a single node as a
> > > pseudo-distributed environment, but I would like to go further and
> > > configure
> > > a complete hadoop cluster. But I got the following questions.
> > >
> > > 1.- I undertsand that HDFS has a master/slave architecture. So master
> and
> > > the master server manages the file system namespace and regulates
> access
> > to
> > > files by clients. So, what happens in a cluster environment if the
> master
> > > server fails or is down due to network issues?
> > > the slave become as master server or something?
> > >
> > >
> > > 2.- What about Haddop Filesystem, from client point of view. the client
> > > should only store files in the HDFS on master server, or clients are
> able
> > > to
> > > store the file to be processed on a HDFS from a slave server as well?
> > >
> > >
> > > 3.- Until now, what I;m doing to run hadoop is:
> > >
> > >    1.- copy file to be processes from Linux File System to HDFS
> > >    2.- Run hadoop shell   hadoop   -jarfile  input output
> > >    3.- The results are stored on output directory
> > >
> > >
> > > There is anyway to have hadoop as a deamon, so that, when the file is
> > > stored
> > > in HDFS the file is processed automatically with hadoop?
> > >
> > > (witout to run hadoop shell everytime)
> > >
> > >
> > > 4.- What happens with processed files, they are deleted form HDFS
> > > automatically?
> > >
> > >
> > > Thanks in advance!
> > >
> > >
> > > -- Gerardo Velez
> > >
> >
> >
> >
> > --
> > Jeffrey Payne
> > Lead Software Engineer
> > Eyealike, Inc.
> > jeffp@eyealike.com
> > www.eyealike.com
> > (206) 257-8708
> >
> >
> > "Anything worth doing is worth overdoing."
> > -H. Lifter
> >
>



-- 
Jeffrey Payne
Lead Software Engineer
Eyealike, Inc.
jeffp@eyealike.com
www.eyealike.com
(206) 257-8708


"Anything worth doing is worth overdoing."
-H. Lifter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message