hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3164) Use FileChannel.transferTo() when data is read from DataNode.
Date Wed, 09 Apr 2008 18:19:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587285#action_12587285
] 

rangadi edited comment on HADOOP-3164 at 4/9/08 11:18 AM:
---------------------------------------------------------------

The following table shows 'dfs -cat' of 10Gb of data. This is a disk bound test and CPU is
measured from /proc/pid/stat. io.file.buffer.size is 128k. This is cluster with single datanode
and the client and datanode are on the same machine.  The three fields reported for each run
are user cpu, kernel cpu, and wall clock time. "total cpu" is sum of user and kernel cpu for
DataNode process.

|| Test || Bound by || Run1 || Run 2 || Run 3 || Cpu % || Avg Total Cpu || 
| Trunk | Disk | 2589u 5208k 253s | 2659u 5162k 265s | 2827u 5341k 328s | 100% | *7929* |
| Trunk + patch | Disk | 0474u 1038k 228s | 0477u 1031k 232s | 0611u 1189k 301s || 20% | *1607*
|

This shows DataNode takes about 80% less cpu. 

Also, since we don't actually allocate any user buffer, we could actually invoke transferTo()
to send even larger amounts of data at a time. I haven't experimented with larger buffer sizes.

      was (Author: rangadi):
    The following table shows 'dfs -cat' of 10Gb of data. This is a disk bound test and CPU
is measured from /proc/pid/stat. io.file.buffer.size is 128k. This is cluster with single
datanode and the client and datanode are on the same machine.  The three fields reported for
each run are user cpu, kernel cpu, and wall clock time. "total cpu" is sum of user and kernel
cpu for DataNode process.

|| Test || Bound by || Run1 || Run 2 || Run 3 || Cpu % || Avg Total Cpu || 
| Trunk | Disk | 2589u 5208k 253s | 2659u 5162k 265s | 2827u 5341k 328s | 100% | *7929* |
| Trunk + patch | Disk | 4074u 1038k 228s | 0477u 1031k 232s | 0611u 1189k 301s || 20% | *1607*
|

This shows DataNode takes about 80% less cpu. 

Also, since we don't actually allocate any user buffer, we could actually invoke transferTo()
to send even larger amounts of data at a time. I haven't experimented with larger buffer sizes.
  
> Use FileChannel.transferTo() when data is read from DataNode.
> -------------------------------------------------------------
>
>                 Key: HADOOP-3164
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3164
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3164.patch, HADOOP-3164.patch, HADOOP-3164.patch
>
>
> HADOOP-2312 talks about using FileChannel's [{{transferTo()}}|http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html#transferTo(long,%20long,%20java.nio.channels.WritableByteChannel)]
and [{{transferFrom()}}|http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel,%20long,%20long)]
in DataNode. 
> At the time DataNode neither used NIO sockets nor wrote large chunks of contiguous block
data to socket. Hadoop 0.17 does both when data is seved to clients (and other datanodes).
I am planning to try using transferTo() in the trunk. This might reduce DataNode's cpu by
another 50% or more.
> Once HADOOP-1702 is committed, we can look into using transferFrom().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message