hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Question about hdfs close * hflush behavior
Date Wed, 07 Sep 2011 21:17:10 GMT
2011/9/7 kang hua <kanghua151@msn.com>:
> Hi friends:
>    I has two question.
>    first one is:
>    I use libhdfs's hflush to flush my data to a file, in same process
> context I can read it. But I find that file unchanged if I check from hadoop
> shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it in
> program); however when I reboot hdfs, I can read that file's content that I
> flushed again。 why ?

If we were to update the file metadata on hflush, it would be very
expensive, since the metadata lives in the NameNode.

If you do hadoop fs -cat xxx, you should see the entirety of the flushed data.

>    can I hflush data to file without close it,at same time read data flushed
> by other process ?


>    second one is:
>    does once close hdfs file, the last written block is untouched. even open
> that file with append mode, namenode will alloc a new block to for append
> data?

No, it reopens the last block of the existing file for append.

>    I find if I close file and open it with append mode again and again. hdfs
> report will show "used space much more that the file logic size"

Not sure I follow what you mean by this. Can you give more detail?

>    btw: I use cloudera ch2

The actual "append()" function has some bugs in all of the 0.20
releases, including Cloudera's. The hflush/sync() API is fine to use,
but I would recommend against using append().

Todd Lipcon
Software Engineer, Cloudera

View raw message