hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2154) Non-interleaved checksums would optimize block transfers.
Date Mon, 26 Nov 2007 17:29:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545517

Raghu Angadi commented on HADOOP-2154:

In my initial implementation of HADOOP-1134, I did not keep buffers between socket and datanode
(reader and writer). Looks like this jira proposes that. Note that I had to put the buffers
back since there was a regression on DFSIO benchmarks and sort. Pretty much none of our benchmarks
is cpu intensive on Datanodes.

If we want to get rid of extra buffer copies, I would either look in to one these two :
# reorganize the while loop so that there is one extra copy (from disk to user buffer) and
not two. i.e. large user buffer directly written to socket (in the case of block read).
# Remove both copies by extending the protocol to allow one DATA_CHUNK to allow multiple CHECKSUM
chunks. e.g. one DATA_CHUNK would contain 64k worth of block data directly to user buffer
and 65k*4/512 checksum bytes at the end. So that Datanode directly reads in to large user
buffer and that buffer is written to socket (basically bringing buffer handling back to pre
# Using multiple sockets is another option but I am not a fan of it.

> Non-interleaved checksums would optimize block transfers.
> ---------------------------------------------------------
>                 Key: HADOOP-2154
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2154
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Konstantin Shvachko
>            Assignee: Rajagopal Natarajan
>             Fix For: 0.16.0
> Currently when a block is transfered to a data-node the client interleaves data chunks
with the respective checksums. 
> This requires creating an extra copy of the original data in a new buffer interleaved
with the crcs.
> We can avoid extra copying if the data and the crc are fed to the socket one after another.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message