hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3672) support for persistent connections to improve random read performance.
Date Mon, 14 Jul 2008 23:00:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613480#action_12613480
] 

Doug Cutting commented on HADOOP-3672:
--------------------------------------

> Avoiding buffer copies is an absolute must to even consider the approach [...]

The client input stream can pass in its buffer directly to RPC, and, so long as the RPC's
input stream buffer is no larger, Java's standard cascading convention means that most data
would be read directly from the socket into the input stream.  To minimize copying, the RPC's
internal input stream buffer should be quite small, just big enough to hold header information
like the call id, then stream cascading will cause most data to be read directly into the
client's buffer.  So I don't see that RPC adds more copies.

> support for persistent connections to improve random read performance.
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-3672
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3672
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.0
>         Environment: Linux 2.6.9-55  , Dual Core Opteron 280 2.4Ghz , 4GB memory
>            Reporter: George Wu
>         Attachments: pread_test.java
>
>
> preads() establish new connections per request. yourkit java profiles show that this
connection overhead is pretty significant on the DataNode. 
> I wrote a simple microbenchmark program which does many iterations of pread() from different
offsets of a large file. I hacked DFSClient/DataNode code to re-use the same connection/DataNode
request handler thread. The performance improvement was 7% when the data is served from disk
and 80% when the data is served from the OS page cache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message