hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Da Zheng <zhengda1...@gmail.com>
Subject the overhead of HDFS
Date Wed, 02 Feb 2011 21:10:53 GMT

I have been using Hadoop on a cluster with AMD Opteron Processor 2212 
clocked at 2GMz and also a cluster with Atom N330 clocked at 1.6GHz. 
Both are dual cores. I always use HDFS for storing input data and output 
data and I observe high CPU consumption caused by HDFS in both clusters. 
In the AMD cluster, the bottleneck is the disk. I use TestDFSIO to test 
the performance. The writing throughput to HDFS is about 50MB/s when the 
replication factor is 1 and each node runs one mapper, but the CPU 
consumption is about 50% for DataNode and about 40% for the mapper of 
TestDFSIO. When I test the Atom cluster, the bottleneck is CPU. I used 
the same setting and I got the similar writing throughput, but the CPU 
consumption is close to 100% for DataNode and the mapper. Could anyone 
tell me what is the CPU usage in your cluster?


View raw message