hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "genericqa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
Date Fri, 02 Mar 2018 04:14:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383143#comment-16383143
] 

genericqa commented on HDFS-13056:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 28s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 6 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 30s{color} | {color:blue}
Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 49s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 24s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 28s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 25s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 49s{color}
| {color:green} branch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 37s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 37s{color} |
{color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 18s{color} | {color:blue}
Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 26s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m  4s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 12m  4s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m  4s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 15s{color}
| {color:green} root: The patch generated 0 new + 609 unchanged - 1 fixed = 609 total (was
610) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  1s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 18s{color}
| {color:green} patch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 37s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 33s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 17s{color} | {color:green}
hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 35s{color} | {color:green}
hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 43s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 38s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}202m 50s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HDFS-13056 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12912701/HDFS-13056.004.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  shadedclient
 findbugs  checkstyle  cc  |
| uname | Linux 3947910dbb4b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 923e177 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23256/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23256/testReport/ |
| Max. process+thread count | 3996 (vs. ulimit of 10000) |
| modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client
hadoop-hdfs-project/hadoop-hdfs U: . |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23256/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13056
>                 URL: https://issues.apache.org/jira/browse/HDFS-13056
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, distcp, erasure-coding, federation, hdfs
>    Affects Versions: 3.0.0
>            Reporter: Dennis Huo
>            Priority: Major
>         Attachments: HDFS-13056-branch-2.8.001.patch, HDFS-13056-branch-2.8.poc1.patch,
HDFS-13056.001.patch, HDFS-13056.002.patch, HDFS-13056.003.patch, HDFS-13056.003.patch, HDFS-13056.004.patch,
Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, hdfs-file-composite-crc32-v2.pdf,
hdfs-file-composite-crc32-v3.pdf
>
>
> FileChecksum was first introduced in [https://issues-test.apache.org/jira/browse/HADOOP-3981] and
ever since then has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are
already stored as part of datanode metadata, and the MD5 approach is used to compute an aggregate
value in a distributed manner, with individual datanodes computing the MD5-of-CRCs per-block
in parallel, and the HDFS client computing the second-level MD5.
>  
> A shortcoming of this approach which is often brought up is the fact that this FileChecksum
is sensitive to the internal block-size and chunk-size configuration, and thus different
HDFS files with different block/chunk settings cannot be compared. More commonly, one might
have different HDFS clusters which use different block sizes, in which case any data migration
won't be able to use the FileChecksum for distcp's rsync functionality or for verifying end-to-end
data integrity (on top of low-level data integrity checks applied at data transfer time).
>  
> This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 during the
addition of checksum support for striped erasure-coded files; while there was some discussion
of using CRC composability, it still ultimately settled on hierarchical MD5 approach, which also adds
the problem that checksums of basic replicated files are not comparable to striped files.
>  
> This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses CRC composition
to remain completely chunk/block agnostic, and allows comparison between striped vs replicated
files, between different HDFS instances, and possible even between HDFS and other external
storage systems. This feature can also be added in-place to be compatible with existing block
metadata, and doesn't need to change the normal path of chunk verification, so is minimally
invasive. This also means even large preexisting HDFS deployments could adopt this feature
to retroactively sync data. A detailed design document can be found here: https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message