hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: distcp fails with "source and target differ in block-size"
Date Tue, 24 May 2016 16:53:54 GMT
Hello Dmitry,

To clarify, the intent of MAPREDUCE-5065 was to message the user that
using different block sizes on source and destination might cause a
failure to checksum mismatch.  The message to the user recommends either
the -pb (preserve block size) or -skipCrc (skip checksum validation) as
potential workarounds.  The intent of that patch was not to silently
proceed and report success when the block sizes are different, although
there was some discussion of that on the issue as a proposed solution.

To the best of my knowledge, this behavior hasn't really changed.  Only
the messaging to the user has changed to advise on some potential
workarounds.

--Chris Nauroth




On 5/22/16, 10:31 AM, "Dmitry Sivachenko" <trtrmitya@gmail.com> wrote:

>
>> On 21 May 2016, at 09:34, Dmitry Sivachenko <trtrmitya@gmail.com> wrote:
>> 
>> 
>>> On 21 May 2016, at 02:15, Chris Nauroth <cnauroth@hortonworks.com>
>>>wrote:
>>> 
>>> Hello Dmitry,
>>> 
>>> MAPREDUCE-5065 has been included in these branches for a long time.
>>>Are
>>> you certain that you passed a dfs.blocksize equal to what was used in
>>>the
>>> source files?  Did all source files use the same block size?
>>> 
>> 
>> 
>> No, I am sure that I use -D dfs.blocksize=DifferentThanSourceBlockSize
>>(I want to change it during the copy).
>> 
>> I am not sure that all source files use the same block size (there are
>>thousands of them), but it is probably wrong to report error when I use
>>distcp to change block size?  SInce it is well-documented way for
>>changing block size.
>> 
>> Sorry if I am missing something.
>> 
>
>
>So to be clear: right now with Hadoop-2.7.2 I always get "checksum
>mismatch" error when I try to distcp a file with
>-Ddfs.blocksize=DifferentBlockSize
>
>And it looks like undesired behaviour, at least some stackoverflow
>articles suggest distcp as a way to change blocksize of existing file:
>
>http://stackoverflow.com/questions/29604823/change-block-size-of-existing-
>files-in-hadoop
>
>So probably some time ago this did not lead to error.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message