hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HDFS dfs.client.read.shortcircuit.skip.checksum
Date Mon, 17 Sep 2012 21:36:26 GMT
Hi LiuLei,

Since you're using CDH3 (a 1.x derived distribution) you are using the old
checksum implementations written in Java.

In Hadoop 2.0 (or CDH4), we have JNI-based checksumming which uses
Nehalem's hardware CRC support. This is several times faster.

My guess is that this accounts for the substantial difference. You could
try re-running your test on a newer version to confirm.


On Sat, Sep 15, 2012 at 7:13 AM, jlei liu <liulei412@gmail.com> wrote:

> I read  64k data from file every time.

Todd Lipcon
Software Engineer, Cloudera

View raw message