hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14147) Backport of HDFS-13056 to the 2.9 branch: "Expose file-level composite CRCs in HDFS"
Date Wed, 09 Jan 2019 01:46:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737739#comment-16737739
] 

Yan commented on HDFS-14147:
----------------------------

Thanks [~vrushalic] for the hints to trigger jenkins. I have tried various options as suggested
in the past few days. None seemed successful.

On your questions on the feature, my answers are as follows:

1) The feature is not allowing the comparison. One can always compare the checksums. But without
this feature, the comparison won't make sense between HDFS files of different block sizes/chunk
sizes, or between a HDFS file and one on a different storage systems, etc;

2) The feature is runtime behavior on-the-fly and has no persistent impact. And the default
HDFS client behavior is the old checksum computation approach. So there are no version compatibility
issues between a new HDFS software against existing HDFS persistent data.

3) Again by default HDFS client uses the old "MD5MD5CRC" algorithm to compute the HDFS file
checksum; the new "composite crc" algorithm has to be used explicitly with the dfs.checksum.combine.mode
configuration flag being set to COMPOSITE_CRC.

> Backport of HDFS-13056 to the 2.9 branch: "Expose file-level composite CRCs in HDFS"
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-14147
>                 URL: https://issues.apache.org/jira/browse/HDFS-14147
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, distcp, hdfs
>    Affects Versions: 2.9.0, 2.9.1, 2.9.2
>            Reporter: Yan
>            Priority: Major
>         Attachments: HDFS-14147-branch-2.9-001.patch, HDFS-14147-branch-2.9-001.patch,
HDFS-14147.pdf
>
>
> HDFS-13056, Expose file-level composite CRCs in HDFS which are comparable across different
instances/layouts, is a significant feature for storage agnostic CRC comparisons between
HDFS and cloud object stores such as S3 and GCS. With the extensively installed base of Hadoop
2, it should make a lot of sense to have the feature in Hadoop 2.
> The plan is to start with the backporting to 2.9, followed by 2.8 and 2.7 in that order.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message