hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: OK to run data node on same machine as secondary name node?
Date Thu, 16 Aug 2012 02:27:40 GMT
Please keep in mind that you can run an entire cluster on a single server. (Pseudodistributed

Having said that, while you can do something doesn't mean its a good idea to do it. :-)

With respect to the secondary NN and DN on the same machine? Sure. If the machine has enough
power, why not? 
However I would probably recommend not doing it. 

If you're running Apache based Hadoop, you would want to configure your NN, SN, JT nodes different
from you DNs. 
But again, there's no reason why it can't be done....

On Aug 15, 2012, at 5:53 PM, jcfolsom@pureperfect.com wrote:

> Not an expert but... I think a lot of it depends on your usage pattern.
> How many machines are we talking about? What is the replication factor?
> If it's only two machines, there would need to be a datanode on both in
> order to provide replication.
> I guess you could also keep the secondary name node on the same machine
> as the name node and just have the other(s) be a data node. After all,
> that is the default configuration. It certainly makes administration
> easier, since you would only need two machine images instead of three.
> If the name node goes down you're hosed anyway. The down side would be
> memory consumption on the name node, but recovery time would be faster.
> It may not be the best configuration for write-heavy workloads and at
> some point you are going to hit a hardware ceiling.
> -------- Original Message --------
> Subject: OK to run data node on same machine as secondary name node?
> From: David Rosenstrauch <darose@darose.net>
> Date: Wed, August 15, 2012 6:11 pm
> To: user@hadoop.apache.org
> I have a Hadoop cluster that's a little tight on resources. I was 
> thinking one way I could solve this could be by running an additional 
> data node on the same machine as the secondary name node.
> I wouldn't dare do that on the primary name node, since that machine 
> needs to be extremely performant. But since all the secondary name node 
> does is doing a merge of the name node's checkpoint and logs, which is 
> not an activity that require top-notch real-time performance, I thought 
> it might not be a problem if I were to set up a data node running there 
> as well.
> Any reasons why that might be a bad idea?
> Thanks,
> DR

View raw message