hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerardo Velez" <jgerardo.ve...@gmail.com>
Subject Re: basic questions about Hadoop!
Date Fri, 29 Aug 2008 00:22:28 GMT
Thanks Jeff and sorry for bothering you again!

I got clear the remoting writing into HDFS, but what about hadoop process?

Once the file has been copied to HDFS, do I still needs to run

hadoop -jarfile  input output   everytime?

if I need to do it everytime, should I do it from remote server as well?


Thank for helping and for your patience


-- Gerardo


On Thu, Aug 28, 2008 at 5:10 PM, Jeff Payne <jeffp@eyealike.com> wrote:

> You can use the hadoop command line on machines that aren't hadoop servers.
> If you copy the hadoop configuration from one of your master servers or
> data
> node to the client machine and run the command line dfs tools, it will copy
> the files directly to the data node.
>
> Or, you could use one of the client libraries.  The java client, for
> example, allows you to open up an output stream and start dumping bytes on
> it.
>
> On Thu, Aug 28, 2008 at 5:05 PM, Gerardo Velez <jgerardo.velez@gmail.com
> >wrote:
>
> > Hi Jeff, thank you for answering!
> >
> > What about remote writing on HDFS, lets suppose I got an application
> server
> > on a
> > linux server A and I got a Hadoop cluster on servers B (master), C
> (slave),
> > D (slave)
> >
> > What I would like is sent some files from Server A to be processed by
> > hadoop. So in order to do so, what I need to do.... do I need send those
> > files to master server first and then copy those to HDFS?
> >
> > or can I pass those files to any slave server?
> >
> > basically I'm looking for remote writing due to files to be process are
> not
> > being generated on any haddop server.
> >
> > Thanks again!
> >
> > -- Gerardo
> >
> >
> >
> > Regarding
> >
> > On Thu, Aug 28, 2008 at 4:04 PM, Jeff Payne <jeffp@eyealike.com> wrote:
> >
> > > Gerardo:
> > >
> > > I can't really speak to all of your questions, but the master/slave
> issue
> > > is
> > > a common concern with hadoop.  A cluster has a single namenode and
> > > therefore
> > > a single point of failure.  There is also a secondary name node process
> > > which runs on the same machine as the name node in most default
> > > configurations.  You can make it a different machine by adjusting the
> > > master
> > > file.  One of the more experienced lurkers should feel free to correct
> > me,
> > > but my understanding is that the secondary name node keeps track of all
> > the
> > > same index information used by the primary name node.  So, if the
> > namenode
> > > fails, there is no automatic recovery, but you can always tweak your
> > > cluster
> > > configuration to make the secondary namenode the primary and safely
> > restart
> > > the cluster.
> > >
> > > As for the storage of files, the name node is really just the traffic
> cop
> > > for HDFS.  No HDFS files are actually stored on that machine.  It's
> > > basically used as a directory and lock manager, etc.  The files are
> > stored
> > > on multiple datanodes and I'm pretty sure all the actual file I/O
> happens
> > > directly between the client and the respective datanodes.
> > >
> > > Perhaps one of the more hardcore hadoop people on here will point it
> out
> > if
> > > I'm giving bad advice.
> > >
> > >
> > > On Thu, Aug 28, 2008 at 2:28 PM, Gerardo Velez <
> jgerardo.velez@gmail.com
> > > >wrote:
> > >
> > > > Hi Everybody!
> > > >
> > > > I'm a newbie with Hadoop, I've installed it as a single node as a
> > > > pseudo-distributed environment, but I would like to go further and
> > > > configure
> > > > a complete hadoop cluster. But I got the following questions.
> > > >
> > > > 1.- I undertsand that HDFS has a master/slave architecture. So master
> > and
> > > > the master server manages the file system namespace and regulates
> > access
> > > to
> > > > files by clients. So, what happens in a cluster environment if the
> > master
> > > > server fails or is down due to network issues?
> > > > the slave become as master server or something?
> > > >
> > > >
> > > > 2.- What about Haddop Filesystem, from client point of view. the
> client
> > > > should only store files in the HDFS on master server, or clients are
> > able
> > > > to
> > > > store the file to be processed on a HDFS from a slave server as well?
> > > >
> > > >
> > > > 3.- Until now, what I;m doing to run hadoop is:
> > > >
> > > >    1.- copy file to be processes from Linux File System to HDFS
> > > >    2.- Run hadoop shell   hadoop   -jarfile  input output
> > > >    3.- The results are stored on output directory
> > > >
> > > >
> > > > There is anyway to have hadoop as a deamon, so that, when the file is
> > > > stored
> > > > in HDFS the file is processed automatically with hadoop?
> > > >
> > > > (witout to run hadoop shell everytime)
> > > >
> > > >
> > > > 4.- What happens with processed files, they are deleted form HDFS
> > > > automatically?
> > > >
> > > >
> > > > Thanks in advance!
> > > >
> > > >
> > > > -- Gerardo Velez
> > > >
> > >
> > >
> > >
> > > --
> > > Jeffrey Payne
> > > Lead Software Engineer
> > > Eyealike, Inc.
> > > jeffp@eyealike.com
> > > www.eyealike.com
> > > (206) 257-8708
> > >
> > >
> > > "Anything worth doing is worth overdoing."
> > > -H. Lifter
> > >
> >
>
>
>
> --
> Jeffrey Payne
> Lead Software Engineer
> Eyealike, Inc.
> jeffp@eyealike.com
> www.eyealike.com
> (206) 257-8708
>
>
> "Anything worth doing is worth overdoing."
> -H. Lifter
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message