hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <jcfol...@pureperfect.com>
Subject RE: OK to run data node on same machine as secondary name node?
Date Wed, 15 Aug 2012 22:53:57 GMT

Not an expert but... I think a lot of it depends on your usage pattern.

How many machines are we talking about? What is the replication factor?
If it's only two machines, there would need to be a datanode on both in
order to provide replication.

I guess you could also keep the secondary name node on the same machine
as the name node and just have the other(s) be a data node. After all,
that is the default configuration. It certainly makes administration
easier, since you would only need two machine images instead of three.
If the name node goes down you're hosed anyway. The down side would be
memory consumption on the name node, but recovery time would be faster.
It may not be the best configuration for write-heavy workloads and at
some point you are going to hit a hardware ceiling.

-------- Original Message --------
Subject: OK to run data node on same machine as secondary name node?
From: David Rosenstrauch <darose@darose.net>
Date: Wed, August 15, 2012 6:11 pm
To: user@hadoop.apache.org

I have a Hadoop cluster that's a little tight on resources. I was 
thinking one way I could solve this could be by running an additional 
data node on the same machine as the secondary name node.

I wouldn't dare do that on the primary name node, since that machine 
needs to be extremely performant. But since all the secondary name node 
does is doing a merge of the name node's checkpoint and logs, which is 
not an activity that require top-notch real-time performance, I thought 
it might not be a problem if I were to set up a data node running there 
as well.

Any reasons why that might be a bad idea?



View raw message