hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From praveenesh kumar <praveen...@gmail.com>
Subject Re: HDFS copyToLocal and get crc option
Date Fri, 31 Jan 2014 17:14:42 GMT
Hi Tom,

My hint is your BLOCKSIZE should be multiple of CRC. Check your property
dfs.block.size - convert it into bytes, then divide it with the checksum
value that is set, usually its dfs.bytes-per-checksum property that tells
this value or you can get the checksum value from the error message you are

HDFS uses this checksum value to make sure the data doesn't get courrpted
while transfer (due to loss of bytes etc).

I hope setting your block size with the multiple of your CRC checksum
should solve your problem


On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <tombrown52@gmail.com> wrote:

> What is the right way to use the "-crc" option with hadoop dfs
> -copyToLocal?
> Is this the wrong list?
> --Tom
> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <tombrown52@gmail.com> wrote:
>> I am archiving a large amount of data out of my HDFS file system to a
>> separate shared storage solution (There is not much HDFS space left in my
>> cluster, and upgrading it is not an option right now).
>> I understand that HDFS internally manages checksums and won't succeed if
>> the data doesn't match the CRC, so I'm not worried about corruption when
>> reading from HDFS.
>> However, I want to store the HDFS crc calculations alongside the data
>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>> <hdfs-source> <local-dest>" command would work, but it always gives me
>> error "-crc option is not valid when source file system does not have crc
>> files"
>> Can someone explain what exactly that option does, and when (if ever) it
>> should be used?
>> Thanks in advance!
>> --Tom

View raw message