hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mungeol Heo <mungeol....@gmail.com>
Subject Re: Why is the size of a HDFS file changed?
Date Tue, 10 Jan 2017 01:37:03 GMT
BTW, these files are collected using apache flume.

On Tue, Jan 10, 2017 at 10:11 AM, Mungeol Heo <mungeol.heo@gmail.com> wrote:
> Yes, that's the reason I wonder why is the specific one file cause the
> problem while other data files of a hive table are not.
>
> On Tue, Jan 10, 2017 at 3:42 AM, Ravi Prakash <ravihadoop@gmail.com> wrote:
>> I have not been able to reproduce this:
>>
>> [raviprak@ravi ~]$ hdfs dfs -put HuckleberryFinn.txt /
>> [raviprak@ravi ~]$ cd /tmp
>> [raviprak@ravi tmp]$ hdfs dfs -get /HuckleberryFinn.txt
>> [raviprak@ravi tmp]$ hdfs dfs -cat /HuckleberryFinn.txt > hck
>> [raviprak@ravi tmp]$ md5sum hck
>> 8dc8966178cc1bf4eb95a5b31780269c  hck
>> [raviprak@ravi tmp]$ md5sum HuckleberryFinn.txt
>> 8dc8966178cc1bf4eb95a5b31780269c  HuckleberryFinn.txt
>> [raviprak@ravi tmp]$ hdfs dfs -put hck /
>> [raviprak@ravi tmp]$ hdfs dfs -checksum /HuckleberryFinn.txt
>> /HuckleberryFinn.txt    MD5-of-0MD5-of-512CRC32C
>> 000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8
>> [raviprak@ravi tmp]$ hdfs dfs -checksum /hck
>> /hck    MD5-of-0MD5-of-512CRC32C
>> 000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8
>>
>> This is on trunk.
>>
>> On Sun, Jan 8, 2017 at 6:52 PM, Mungeol Heo <mungeol.heo@gmail.com> wrote:
>>>
>>> "^A" is used as delimiter in the file.
>>> However, I don't think this is the reason causing the problem, because
>>> there are files also using "^A" as delimiter but with no problem.
>>> BTW, the reason using "^A" as delimiter is these files are hive data.
>>>
>>> On Sat, Jan 7, 2017 at 12:17 AM, Ravi Prakash <ravihadoop@gmail.com>
>>> wrote:
>>> > Is there a carriage return / new line / some other whitespace which
>>> > `cat`
>>> > may be appending?
>>> >
>>> > On Thu, Jan 5, 2017 at 6:09 PM, Mungeol Heo <mungeol.heo@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hello,
>>> >>
>>> >> Suppose, I name the HDFS file which cause the problem as A.
>>> >>
>>> >> hdfs dfs -ls A
>>> >> -rw-r--r--   3 web_admin hdfs  868003931 2017-01-04 09:05 A
>>> >>
>>> >> hdfs dfs -get A AFromGet
>>> >> hdfs dfs -cat A > AFromCat
>>> >>
>>> >> ls -l
>>> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromGet
>>> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromCat
>>> >>
>>> >> hdfs dfs -put AFromGet
>>> >>
>>> >> diff <(hdfs dfs -cat  A) <(hdfs dfs -cat AFromGet)
>>> >> (no output, which means the contents of two files are same. At least,
>>> >> after "cat")
>>> >>
>>> >> hdfs dfs -checksum A
>>> >> A   MD5-of-262144MD5-of-512CRC32C
>>> >> 000002000000000000040000e667fb4f0dda78101feb2b689af8260b
>>> >>
>>> >> hdfs dfs -checksum AFromGet
>>> >> AFromGet   MD5-of-262144MD5-of-512CRC32C
>>> >> 0000020000000000000400007284759249ff98c7395e6a4bb59343dc
>>> >>
>>> >> As I listed some results above. I wonder why is the size of the file
>>> >> changed.
>>> >> Any help will be GREAT!
>>> >>
>>> >> Thank you.
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>>> >> For additional commands, e-mail: user-help@hadoop.apache.org
>>> >>
>>> >
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message