hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shahab mehmandoust" <shaha...@gmail.com>
Subject Re: To Compute or Not to Compute on Prod
Date Fri, 31 Oct 2008 22:03:23 GMT
Currently, I'm just researching so I'm just playing with the idea of
streaming log data into the HDFS.

I'm confused about: "...all you need is a Hadoop install.  Your production
node doesn't need to be a
datanode."  If my production node is *not* a dataNode then how can I do
"hadoop dfs put?"

I was under the impression that when I install HDFS on a cluster each node
in the cluster is a dataNode.


On Fri, Oct 31, 2008 at 1:46 PM, Norbert Burger <norbert.burger@gmail.com>wrote:

> What are you using to "stream logs into the HDFS"?
> If the command-line tools (ie., "hadoop dfs put") work for you, then all
> you
> need is a Hadoop install.  Your production node doesn't need to be a
> datanode.
> On Fri, Oct 31, 2008 at 2:35 PM, shahab mehmandoust <shahab53@gmail.com
> >wrote:
> > I want to stream data from logs into the HDFS in production but I do NOT
> > want my production machine to be apart of the computation cluster.  The
> > reason I want to do it in this way is to take advantage of HDFS without
> > putting computation load on my production machine.  Is this possible*?*
> > Furthermore, is this unnecessary because the computation would not put a
> > significant load on my production box (obviously depends on the
> map/reduce
> > implementation but I'm asking in general)*?*
> >
> > I should note that our prod machine hosts our core web application and
> > database (saving up for another box :-).
> >
> > Thanks,
> > Shahab
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message