hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Single datanode setup
Date Tue, 30 Mar 2010 10:45:22 GMT
Ed Mazur wrote:
> Hi,
> I have a 12 node cluster where instead of running a DN on each compute
> node, I'm running just one DN backed by a large RAID (with a
> dfs.replication of 1). The compute node storage is limited, so the
> idea behind this was to free up more space for intermediate job data.
> So the cluster has that one node with the DN, a master node with the
> JT/NN, and 10 compute nodes each with a TT. I am running 0.20.1+169.68
> from Cloudera.
> The problem is that MR job performance is now worse than when using a
> traditional HDFS setup. A job that took 76 minutes before now takes
> 169 minutes. I've used this single DN setup before on a
> similarly-sized cluster without any problems, so what can I do to find
> the bottleneck?

I wouldn't use hdfs in this situation. Your network will be the 
bottleneck. If you have a SAN, high end filesystem and/or fast network, 
just use file:// URLs and let the underlying OS/network handle it. I 
know people who use alternate filesystems this way. Side benefit: the NN 
is longer an SPOF. Just your storage array. But they never fail, right?

Having a single DN and NN is a waste of effort here. There's no 
locality, no replication, so no need for the replication and locality 
features of HDFS. Try mounting the filestore everywhere with NFS (or 
other protocol of choice), and skip HDFS entirely.


View raw message