hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-2144) Data node process consumes 180% cpu
Date Fri, 02 Nov 2007 14:03:50 GMT
Data node process consumes 180% cpu 

                 Key: HADOOP-2144
                 URL: https://issues.apache.org/jira/browse/HADOOP-2144
             Project: Hadoop
          Issue Type: Improvement
            Reporter: Runping Qi

I did a test on DFS read throughput and found that the data node process consumes up to 180%
cpu when it is under heavi load. Here are the details:

The cluster has 380+ machines, each with 3GB mem and 4 cpus and 4 disks.
I copied a 10GB file to dfs from one machine with a data node running there.
Based on the dfs block placement policy, that machine has one replica for each block of the
then I run 4 of the following commands in parellel:

hadoop dfs -cat thefile > /dev/null &

Since all the blocks have a local replica, all the read requests went to the local data node.
I observed that:

    The data node process's cpu usage was around 180% for most of the time .

    The clients's cpu usage was moderate (as it should be).

    All the four disks were working concurrently with comparable read throughput.

    The total read throughput was maxed at 90MB/Sec, about 60% of the expected total 
    aggregated max read throughput of 4 disks (160MB/Sec)

The data node's cpu usage seems unreasonably high.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message