hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3672) support for persistent connections to improve random read performance.
Date Tue, 15 Jul 2008 00:03:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613495#action_12613495
] 

Raghu Angadi commented on HADOOP-3672:
--------------------------------------

> Perhaps a separate issue should be filed to explore what would be needed for RPC to be
usable by the mapred shuffle?

Thats right, we should first list the requirements in the new jira. I can surely list good-to-haves
for HDFS transfers.

> I don't think so. It calls their write(OutputStream) methods. [...]
:). It does, but OutputStream is a {{DataOutputBuffer()}}. see Client.java:476. Similarly
on the server, we first read/write the data to a single buffer.


> support for persistent connections to improve random read performance.
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-3672
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3672
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.17.0
>         Environment: Linux 2.6.9-55  , Dual Core Opteron 280 2.4Ghz , 4GB memory
>            Reporter: George Wu
>         Attachments: pread_test.java
>
>
> preads() establish new connections per request. yourkit java profiles show that this
connection overhead is pretty significant on the DataNode. 
> I wrote a simple microbenchmark program which does many iterations of pread() from different
offsets of a large file. I hacked DFSClient/DataNode code to re-use the same connection/DataNode
request handler thread. The performance improvement was 7% when the data is served from disk
and 80% when the data is served from the OS page cache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message