hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1702) Reduce buffer copies when data is written to DFS
Date Thu, 08 May 2008 18:07:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595350#action_12595350
] 

Tsz Wo (Nicholas), SZE commented on HADOOP-1702:
------------------------------------------------

> 1. Currently DFSClient moves the checksum down to fill the gap when a partial packet
needs to be sent. To avoid the copy, we could instead store the checksums in the reversed
order starting from the dataStart.

Or, is it good to interleave data and checksums?

> Reduce buffer copies when data is written to DFS
> ------------------------------------------------
>
>                 Key: HADOOP-1702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1702
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-1702.patch, HADOOP-1702.patch, HADOOP-1702.patch, HADOOP-1702.patch,
HADOOP-1702.patch, HADOOP-1702.patch
>
>
> HADOOP-1649 adds extra buffering to improve write performance.  The following diagram
shows buffers as pointed by (numbers). Each eatra buffer adds an extra copy since most of
our read()/write()s match the io.bytes.per.checksum, which is much smaller than buffer size.
> {noformat}
>        (1)                 (2)          (3)                 (5)
>    +---||----[ CLIENT ]---||----<>-----||---[ DATANODE ]---||--<>-> to
Mirror  
>    | (buffer)                  (socket)           |  (4)
>    |                                              +--||--+
>  =====                                                    |
>  =====                                                  =====
>  (disk)                                                 =====
> {noformat}
> Currently loops that read and write block data, handle one checksum chunk at a time.
By reading multiple chunks at a time, we can remove buffers (1), (2), (3), and (5). 
> Similarly some copies can be reduced when clients read data from the DFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message