hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Chansler (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum
Date Tue, 21 Oct 2008 23:24:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Robert Chansler updated HADOOP-3328:

    Release Note:   (was: When client is writing data to DFS, only the lastdatanode in the
pipeline needs to verify the checksum. Saves around 30% CPU on intermediate datanodes. )

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.19.0
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current
protocol includes acks from  the datanodes, an ack from the last node could also serve as
verification that checksum ok. In that sense, only the last datanode needs to verify checksum.
Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553]
from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.

> Also this would make it easier to use transferTo() and transferFrom() on intermediate
datanodes since they don't need to look at the data.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message