hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica
Date Thu, 08 Feb 2018 06:32:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356540#comment-16356540

Xiao Chen commented on HDFS-11187:

Thanks for working on the branch-2 patch Gabor! 

I'm not entirely familiar with this part of code, so would be great if [~jojochuang] or [~kihwal]
can double check.

Some review comments:
Agree we should be fine to not port HDFS-10636 (the replicainfo -> file jira) to branch-2,
as Gabor said.

- Should keep the import change (remove fnfe)
- Should remove {{getLastChecksumAndDataLen}}

- I think we probably should keep the trunk patch's logic in {{addFinalizedBlock}}. But given
this interface difference, maybe we can move this to FsDatasetImple, after the {{addFinalizedBlock}}
returns and before creating the {{FinalizedReplica}} object?

> Optimize disk access for last partial chunk checksum of Finalized replica
> -------------------------------------------------------------------------
>                 Key: HDFS-11187
>                 URL: https://issues.apache.org/jira/browse/HDFS-11187
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Wei-Chiu Chuang
>            Assignee: Gabor Bota
>            Priority: Major
>             Fix For: 3.1.0, 3.0.2
>         Attachments: HDFS-11187-branch-2.001.patch, HDFS-11187.001.patch, HDFS-11187.002.patch,
HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
> The patch at HDFS-11160 ensures BlockSender reads the correct version of metafile when
there are concurrent writers.
> However, the implementation is not optimal, because it must always read the last partial
chunk checksum from disk while holding FsDatasetImpl lock for every reader. It is possible
to optimize this by keeping an up-to-date version of last partial checksum in-memory and reduce
disk access.
> I am separating the optimization into a new jira, because maintaining the state of in-memory
checksum requires a lot more work.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message