hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3672) support for persistent connections to improve random read performance.
Date Mon, 14 Jul 2008 23:24:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613485#action_12613485
] 

rangadi edited comment on HADOOP-3672 at 7/14/08 4:24 PM:
---------------------------------------------------------------

> [...] So I don't see that RPC adds more copies.
current implementation? I think current implementation serializes/deserializes all the arguments
and returned objects, doesn't it imply extra copies? 

Also to match with the current implementation, it some how support kernel transfer too (this
is the reason datanode takes 10 times less cpu compared to 0.16 while serving data). It is
of course possible to enhance RPC to support all these and may be some of these CPU benefits
are not required.

I should probably just  wait for a design for RPC transfers.




      was (Author: rangadi):
    > [...] So I don't see that RPC adds more copies.
current implementation? I think current implementation serializes/deserializes all the arguments
and returned objects, doesn't it imply extra copies? 

Also to match with the current implementation, it some how support kernel transfer too (this
is the reason datanode takes 10 times less cpu compared to 0.16 while serving data). It is
of course possible to enhance RPC to all these and may some of these CPU benefits are not
required.

I should probably just  wait for a design for RPC transfers.



  
> support for persistent connections to improve random read performance.
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-3672
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3672
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.0
>         Environment: Linux 2.6.9-55  , Dual Core Opteron 280 2.4Ghz , 4GB memory
>            Reporter: George Wu
>         Attachments: pread_test.java
>
>
> preads() establish new connections per request. yourkit java profiles show that this
connection overhead is pretty significant on the DataNode. 
> I wrote a simple microbenchmark program which does many iterations of pread() from different
offsets of a large file. I hacked DFSClient/DataNode code to re-use the same connection/DataNode
request handler thread. The performance improvement was 7% when the data is served from disk
and 80% when the data is served from the OS page cache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message