hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: HDFS copyToLocal and get crc option
Date Fri, 31 Jan 2014 17:58:04 GMT
I am using default values for both. My version is 1.1.2, and the default
value for "dfs.block.size" (67108864) is evenly divisible by 512.

However, the default value online reference for my version (
http://hadoop.apache.org/docs/r1.1.2/hdfs-default.html) doesn't have any
checksum related settings.

Was the implementation of the checksum feature added recently?


On Fri, Jan 31, 2014 at 10:14 AM, praveenesh kumar <praveenesh@gmail.com>wrote:

> Hi Tom,
> My hint is your BLOCKSIZE should be multiple of CRC. Check your property
> dfs.block.size - convert it into bytes, then divide it with the checksum
> value that is set, usually its dfs.bytes-per-checksum property that tells
> this value or you can get the checksum value from the error message you are
> getting.
> HDFS uses this checksum value to make sure the data doesn't get courrpted
> while transfer (due to loss of bytes etc).
> I hope setting your block size with the multiple of your CRC checksum
> should solve your problem
> Regards
> Prav
> On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <tombrown52@gmail.com> wrote:
>> What is the right way to use the "-crc" option with hadoop dfs
>> -copyToLocal?
>> Is this the wrong list?
>> --Tom
>> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <tombrown52@gmail.com> wrote:
>>> I am archiving a large amount of data out of my HDFS file system to a
>>> separate shared storage solution (There is not much HDFS space left in my
>>> cluster, and upgrading it is not an option right now).
>>> I understand that HDFS internally manages checksums and won't succeed if
>>> the data doesn't match the CRC, so I'm not worried about corruption when
>>> reading from HDFS.
>>> However, I want to store the HDFS crc calculations alongside the data
>>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>>> <hdfs-source> <local-dest>" command would work, but it always gives
me the
>>> error "-crc option is not valid when source file system does not have crc
>>> files"
>>> Can someone explain what exactly that option does, and when (if ever) it
>>> should be used?
>>> Thanks in advance!
>>> --Tom

View raw message