hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akira AJISAKA <ajisa...@oss.nttdata.co.jp>
Subject Re: DistCp CRC failure modes
Date Wed, 27 Apr 2016 16:26:26 GMT
(Added hdfs-dev ML)

Thanks Elliot for reporting this issue.

I'm thinking this is not by design, so we should fix it.
Would you file a JIRA for this issue?

If you don't have time to do so, I'll file it on behalf of you.


On 4/27/16 22:43, Elliot West wrote:
> Hello,
> We are using DistCp V2 to replicate data between two HDFS file systems.
> We were working on the assumption that we could rely on CRC checks to
> ensure that the data was replicated correctly. However, after examining
> the DistCp source code it seems that there are edge cases where the CRCs
> could differ and yet the copy succeeds even when we are not skipping CRC
> checks.
> I'm wondering whether this is by design and if so, the reasoning behind
> it? If this is a bug, I'd like to raise an issue to fix it. If it is by
> design, I'd like to propose the introduction an option for stricter CRC
> checks.
> The code in question is contained in the method:
>     org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)
> which can be seen here:
>     https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457
> Specifically this code block suggests that if there is a failure when
> trying to read the source or target checksum then the method will return
> 'true', implying that the check succeeded. In actual fact we just failed
> to obtain the checksum and could perform no check.
>      try {
>        sourceChecksum = sourceChecksum != null ? sourceChecksum : sourceFS
>            .getFileChecksum(source);
>        targetChecksum = targetFS.getFileChecksum(target);
>      } catch (IOException e) {
>        LOG.error("Unable to retrieve checksum for " + source + " or " +
> target, e);
>      }
>      return (sourceChecksum == null || targetChecksum == null ||
>              sourceChecksum.equals(targetChecksum));
> Ideally I'd like to be able to configure a check where we require that
> both the source and target CRCs are retrieved and compared, and if for
> any reason either of the CRCs retrievals fail then an exception is
> thrown. I do appreciate that some FileSystems cannot return CRCs but
> these could still be handled correctly as they would simply return null
> and not throw an exception (I assume).
> I'd appreciate any thoughts on this matter.
> Elliot.

View raw message