hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: CRC32 performance
Date Thu, 09 Oct 2008 06:38:45 GMT

Datanodes and persistent storage can deal with different checksums. But 
client does not support it yet (easier to fix since it is not tied to 
persistent data).

Regd CPU comparisions, most reliable I found is to test with either by 
maxing out CPU on a machine and comparing the time taken, or comparing 
cpu reported in /proc/pid/stat. see 
https://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553

) for e.g.

Raghu.

Bryan Duxbury wrote:
> I put together a small benchmark app just for the two CRC algorithms 
> alone (code available on request). I run the same amount of data through 
> each one in exactly the same pattern. The results look like this:
> 
> Adler32: 1983 ms
> CRC32: 6514 ms
> Ratio: 0.30442125
> 
> The ratio holds for different lengths of tests, too. This would seem to 
> indicate that there's a fair bit of benefit to be extracted from 
> upgrading to Adler. From looking at the HDFS code, it even seems like 
> it's designed to work with different implementations of Checksum, so it 
> doesn't seem like it would be hard to use this instead.
> 
> I might still take the time to build an isolated benchmark that's 
> actually the hadoop code, but I thought I'd share these intermediate 
> results.
> 
> -Bryan
> 
> On Oct 7, 2008, at 10:31 AM, Doug Cutting wrote:
> 
>> Don't try this on anything but an experimental filesystem.  If you can 
>> simply find the places where HDFS calls the CRC algorithm and replace 
>> them with zeros, then you should be able to get a reasonable benchmark.
>>
>> Doug
>>
>> Bryan Duxbury wrote:
>>> I'm willing to give this a shot. Let me just be sure I understand 
>>> what I'd have to do: if I make it stop computing CRCs altogether, I 
>>> need to make changes in the datanode as well, right? To stop checking 
>>> validity of CRCs? Will this break anything interesting and unexpected?
>>> On Oct 6, 2008, at 4:58 PM, Doug Cutting wrote:
>>>> Bryan Duxbury wrote:
>>>>> I am profiling with YourKit on random reducers. I'm also running on 
>>>>> HDFS, so I don't know how one would go about disabling CRCs.
>>>>
>>>> Hack the CRC-computing code to fill things with zeros?
>>>>
>>>> Doug
> 


Mime
View raw message