hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravihad...@gmail.com>
Subject Re: Why is the size of a HDFS file changed?
Date Mon, 09 Jan 2017 18:42:16 GMT
I have not been able to reproduce this:

[raviprak@ravi ~]$ hdfs dfs -put HuckleberryFinn.txt /
[raviprak@ravi ~]$ cd /tmp
[raviprak@ravi tmp]$ hdfs dfs -get /HuckleberryFinn.txt
[raviprak@ravi tmp]$ hdfs dfs -cat /HuckleberryFinn.txt > hck
[raviprak@ravi tmp]$ md5sum hck
8dc8966178cc1bf4eb95a5b31780269c  hck
[raviprak@ravi tmp]$ md5sum HuckleberryFinn.txt
8dc8966178cc1bf4eb95a5b31780269c  HuckleberryFinn.txt
[raviprak@ravi tmp]$ hdfs dfs -put hck /
[raviprak@ravi tmp]$ hdfs dfs -checksum /HuckleberryFinn.txt
/HuckleberryFinn.txt    MD5-of-0MD5-of-512CRC32C
000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8
[raviprak@ravi tmp]$ hdfs dfs -checksum /hck
/hck    MD5-of-0MD5-of-512CRC32C
000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8

This is on trunk.

On Sun, Jan 8, 2017 at 6:52 PM, Mungeol Heo <mungeol.heo@gmail.com> wrote:

> "^A" is used as delimiter in the file.
> However, I don't think this is the reason causing the problem, because
> there are files also using "^A" as delimiter but with no problem.
> BTW, the reason using "^A" as delimiter is these files are hive data.
>
> On Sat, Jan 7, 2017 at 12:17 AM, Ravi Prakash <ravihadoop@gmail.com>
> wrote:
> > Is there a carriage return / new line / some other whitespace which `cat`
> > may be appending?
> >
> > On Thu, Jan 5, 2017 at 6:09 PM, Mungeol Heo <mungeol.heo@gmail.com>
> wrote:
> >>
> >> Hello,
> >>
> >> Suppose, I name the HDFS file which cause the problem as A.
> >>
> >> hdfs dfs -ls A
> >> -rw-r--r--   3 web_admin hdfs  868003931 2017-01-04 09:05 A
> >>
> >> hdfs dfs -get A AFromGet
> >> hdfs dfs -cat A > AFromCat
> >>
> >> ls -l
> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromGet
> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan  5 18:32 AFromCat
> >>
> >> hdfs dfs -put AFromGet
> >>
> >> diff <(hdfs dfs -cat  A) <(hdfs dfs -cat AFromGet)
> >> (no output, which means the contents of two files are same. At least,
> >> after "cat")
> >>
> >> hdfs dfs -checksum A
> >> A   MD5-of-262144MD5-of-512CRC32C
> >> 000002000000000000040000e667fb4f0dda78101feb2b689af8260b
> >>
> >> hdfs dfs -checksum AFromGet
> >> AFromGet   MD5-of-262144MD5-of-512CRC32C
> >> 0000020000000000000400007284759249ff98c7395e6a4bb59343dc
> >>
> >> As I listed some results above. I wonder why is the size of the file
> >> changed.
> >> Any help will be GREAT!
> >>
> >> Thank you.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> >> For additional commands, e-mail: user-help@hadoop.apache.org
> >>
> >
>

Mime
View raw message