hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zlatin Balevsky (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1034) Enhance datanode to read data and checksum file in parallel
Date Thu, 11 Mar 2010 23:36:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844272#action_12844272

Zlatin Balevsky commented on HDFS-1034:

The only possible bottleneck is the extra disk seek which may or may not be a big deal.  Probably
for HBase-type workloads.  There are many ways around that including but not limited to: 

a) prepending a copy of the checksum file to the block file while keeping the separate copy
intact for off-thread verification after the transfer starts
b) using some ext4-extents jni magic
... ?

> Enhance datanode to read data and checksum file in parallel
> -----------------------------------------------------------
>                 Key: HDFS-1034
>                 URL: https://issues.apache.org/jira/browse/HDFS-1034
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
> In the current HDFS implementation, a read of a block issued to the datanode results
in a disk access to the checksum file followed by a disk access to the checksum file. It would
be nice to be able to do these two IOs in parallel to reduce read latency.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message