hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mungeol Heo <mungeol....@gmail.com>
Subject Re: Why is the size of a HDFS file changed?
Date Tue, 10 Jan 2017 01:11:47 GMT
Yes, that's the reason I wonder why is the specific one file cause the
problem while other data files of a hive table are not.

On Tue, Jan 10, 2017 at 3:42 AM, Ravi Prakash <ravihadoop@gmail.com> wrote:
> I have not been able to reproduce this:
>
> [raviprak@ravi ~]$ hdfs dfs -put HuckleberryFinn.txt /
> [raviprak@ravi ~]$ cd /tmp
> [raviprak@ravi tmp]$ hdfs dfs -get /HuckleberryFinn.txt
> [raviprak@ravi tmp]$ hdfs dfs -cat /HuckleberryFinn.txt > hck
> [raviprak@ravi tmp]$ md5sum hck
> 8dc8966178cc1bf4eb95a5b31780269c  hck
> [raviprak@ravi tmp]$ md5sum HuckleberryFinn.txt
> 8dc8966178cc1bf4eb95a5b31780269c  HuckleberryFinn.txt
> [raviprak@ravi tmp]$ hdfs dfs -put hck /
> [raviprak@ravi tmp]$ hdfs dfs -checksum /HuckleberryFinn.txt
> /HuckleberryFinn.txt    MD5-of-0MD5-of-512CRC32C
> 000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8
> [raviprak@ravi tmp]$ hdfs dfs -checksum /hck
> /hck    MD5-of-0MD5-of-512CRC32C
> 000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8
>
> This is on trunk.
>
> On Sun, Jan 8, 2017 at 6:52 PM, Mungeol Heo <mungeol.heo@gmail.com> wrote:
>>
>> "^A" is used as delimiter in the file.
>> However, I don't think this is the reason causing the problem, because
>> there are files also using "^A" as delimiter but with no problem.
>> BTW, the reason using "^A" as delimiter is these files are hive data.
>>
>> On Sat, Jan 7, 2017 at 12:17 AM, Ravi Prakash <ravihadoop@gmail.com>
>> wrote:
>> > Is there a carriage return / new line / some other whitespace which
>> > `cat`
>> > may be appending?
>> >
>> > On Thu, Jan 5, 2017 at 6:09 PM, Mungeol Heo <mungeol.heo@gmail.com>
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >> Suppose, I name the HDFS file which cause the problem as A.
>> >>
>> >> hdfs dfs -ls A
>> >> -rw-r--r--   3 web_admin hdfs  868003931 2017-01-04 09:05 A
>> >>
>> >> hdfs dfs -get A AFromGet
>> >> hdfs dfs -cat A > AFromCat
>> >>
>> >> ls -l
>> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromGet
>> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromCat
>> >>
>> >> hdfs dfs -put AFromGet
>> >>
>> >> diff <(hdfs dfs -cat  A) <(hdfs dfs -cat AFromGet)
>> >> (no output, which means the contents of two files are same. At least,
>> >> after "cat")
>> >>
>> >> hdfs dfs -checksum A
>> >> A   MD5-of-262144MD5-of-512CRC32C
>> >> 000002000000000000040000e667fb4f0dda78101feb2b689af8260b
>> >>
>> >> hdfs dfs -checksum AFromGet
>> >> AFromGet   MD5-of-262144MD5-of-512CRC32C
>> >> 0000020000000000000400007284759249ff98c7395e6a4bb59343dc
>> >>
>> >> As I listed some results above. I wonder why is the size of the file
>> >> changed.
>> >> Any help will be GREAT!
>> >>
>> >> Thank you.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> >> For additional commands, e-mail: user-help@hadoop.apache.org
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message