hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Da Zheng <zhengda1...@gmail.com>
Subject Re: Hadoop use direct I/O in Linux?
Date Thu, 06 Jan 2011 02:02:57 GMT
isn't DataChecksum just a wrapper of CRC32?
I'm still using Hadoop 0.20.2. there is no PureJavaCrc32

Da

On 1/5/11 7:44 PM, Milind Bhandarkar wrote:
> Have you tried with org.apache.hadoop.util.DataChecksum and org.apache.hadoop.util.PureJavaCrc32
?
> 
> - Milind
> 
> On Jan 5, 2011, at 3:42 PM, Da Zheng wrote:
> 
>> I'm not sure of that. I wrote a small checksum program for testing. After the size
of a block gets to larger than 8192 bytes, I don't see much performance improvement. See the
code below. I don't think 64MB can bring us any benefit.
>> I did change io.bytes.per.checksum to 131072 in hadoop, and the program ran about
4 or 5 minutes faster (the total time for reducing is about 35 minutes).
>>
>> import java.util.zip.CRC32;
>> import java.util.zip.Checksum;
>>
>>
>> public class Test1 {
>>    public static void main(String args[]) {
>>        Checksum sum = new CRC32();
>>        byte[] bs = new byte[512];
>>        final int tot_size = 64 * 1024 * 1024;
>>        long time = System.nanoTime();
>>        for (int k = 0; k < tot_size / bs.length; k++) {
>>            for (int i = 0; i < bs.length; i++)
>>                bs[i] = (byte) i;
>>            sum.update(bs, 0, bs.length);
>>        }
>>        System.out.println("takes " + (System.nanoTime() - time) / 1000 / 1000);
>>    }
>> }
>>
>>
>> On 01/05/2011 05:03 PM, Milind Bhandarkar wrote:
>>> I agree with Jay B. Checksumming is usually the culprit for high CPU on clients
and datanodes. Plus, a checksum of 4 bytes for every 512, means for 64MB block, the checksum
will be 512KB, i.e. 128 ext3 blocks. Changing it to generate 1 ext3 checksum block per DFS
block will speedup read/write without any loss of reliability.
>>>
>>> - milind
>>>
>>> ---
>>> Milind Bhandarkar
>>> (mbhandarkar@linkedin.com)
>>> (650-776-3236)
>>>
>>>
>>>
>>>
>>>
>>>
>>
> 
> ---
> Milind Bhandarkar
> (mbhandarkar@linkedin.com)
> (650-776-3236)
> 
> 
> 
> 
> 
> 


Mime
View raw message