hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: HDFS copyToLocal and get crc option
Date Fri, 31 Jan 2014 17:58:04 GMT
I am using default values for both. My version is 1.1.2, and the default
value for "dfs.block.size" (67108864) is evenly divisible by 512.

However, the default value online reference for my version (
http://hadoop.apache.org/docs/r1.1.2/hdfs-default.html) doesn't have any
checksum related settings.

Was the implementation of the checksum feature added recently?

--Tom


On Fri, Jan 31, 2014 at 10:14 AM, praveenesh kumar <praveenesh@gmail.com>wrote:

> Hi Tom,
>
> My hint is your BLOCKSIZE should be multiple of CRC. Check your property
> dfs.block.size - convert it into bytes, then divide it with the checksum
> value that is set, usually its dfs.bytes-per-checksum property that tells
> this value or you can get the checksum value from the error message you are
> getting.
>
> HDFS uses this checksum value to make sure the data doesn't get courrpted
> while transfer (due to loss of bytes etc).
>
> I hope setting your block size with the multiple of your CRC checksum
> should solve your problem
>
> Regards
> Prav
>
>
> On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <tombrown52@gmail.com> wrote:
>
>> What is the right way to use the "-crc" option with hadoop dfs
>> -copyToLocal?
>>
>> Is this the wrong list?
>>
>> --Tom
>>
>>
>> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <tombrown52@gmail.com> wrote:
>>
>>> I am archiving a large amount of data out of my HDFS file system to a
>>> separate shared storage solution (There is not much HDFS space left in my
>>> cluster, and upgrading it is not an option right now).
>>>
>>> I understand that HDFS internally manages checksums and won't succeed if
>>> the data doesn't match the CRC, so I'm not worried about corruption when
>>> reading from HDFS.
>>>
>>> However, I want to store the HDFS crc calculations alongside the data
>>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>>> <hdfs-source> <local-dest>" command would work, but it always gives
me the
>>> error "-crc option is not valid when source file system does not have crc
>>> files"
>>>
>>> Can someone explain what exactly that option does, and when (if ever) it
>>> should be used?
>>>
>>> Thanks in advance!
>>>
>>> --Tom
>>>
>>
>>
>

Mime
View raw message