hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2144) Data node process consumes 180% cpu
Date Fri, 02 Nov 2007 18:55:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539665

Raghu Angadi commented on HADOOP-2144:

A few initial observations :

With 4 clients, it is cpu bound. Pretty much no cpu is left.. so disk read b/w being lower
is ok.

Assuming 'top' correctly adds up the % cpu of each of the threads for each cpu, there is 400%
cpu on the machine (not sure if this is 4 cores or 2 cpus with 2 hyper threads each).

DataNode takes : 180% on 4 threads
4 Clients take : 200-205% on 4 processes.

So DataNode does not eat up more cpu than the client. The main difference is that Datanode
has extra work to read from disk/kernel into user space and client does checksum verification.
So checksum verification takes a bit more cpu than copying from disk to user space.

For me, most indicative of performance issue / potential improvement is the user / kernel
cpu : Over all, it is 70% user cpu and 25% kernel. For a job that essentially copies data
from process to another, user cpu seem excessive compared to kernel cpu. Extra buffer copies
(see HADOOP-1702) are partly to blame.

> Data node process consumes 180% cpu 
> ------------------------------------
>                 Key: HADOOP-2144
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2144
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Runping Qi
> I did a test on DFS read throughput and found that the data node 
> process consumes up to 180% cpu when it is under heavi load. Here are the details:
> The cluster has 380+ machines, each with 3GB mem and 4 cpus and 4 disks.
> I copied a 10GB file to dfs from one machine with a data node running there.
> Based on the dfs block placement policy, that machine has one replica for each block
of the file.
> then I run 4 of the following commands in parellel:
> hadoop dfs -cat thefile > /dev/null &
> Since all the blocks have a local replica, all the read requests went to the local data
> I observed that:
>     The data node process's cpu usage was around 180% for most of the time .
>     The clients's cpu usage was moderate (as it should be).
>     All the four disks were working concurrently with comparable read throughput.
>     The total read throughput was maxed at 90MB/Sec, about 60% of the expected total

>     aggregated max read throughput of 4 disks (160MB/Sec). Thus disks were not a bottleneck
>     in this case.
> The data node's cpu usage seems unreasonably high.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message