Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of todd@cloudera.com designates
 209.85.161.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <COL102-W59DF99A66AEFF88246C7158F1E0@phx.gbl>
References: 
 <CABe+XS77XzV-z0XK8ptqHH=E=rHLh8iPhP95BvyvfEXT+ivBYw@mail.gmail.com>
 <CABe+XS5tggzW30gM57g58bWaEUe=Tar6O4X+Zei+QTT7S95wuA@mail.gmail.com>
 <CAND0qztr=WbftXVDFOVcfLwsHhO89iko-E_Vu-swyDxC-k33GA@mail.gmail.com>
 <CAMwMvV4cGAs5tqy5EB1s8j6vFDKDac8KSVbF8281qPsg=0cmuQ@mail.gmail.com>
 <COL102-W20292577AB047111C520678F1F0@phx.gbl>
 <CADY20s49Re2zgnvXsu=++VpaZf1eMw5tCd1DSp9X6OubyZsqfw@mail.gmail.com>
 <COL102-W59DF99A66AEFF88246C7158F1E0@phx.gbl>
From: Todd Lipcon <todd@cloudera.com>
Date: Thu, 8 Sep 2011 00:00:23 -0700
Message-ID: 
 <CADY20s4j5o2a=2HCjsZ-9Ch6uo+kE=aEBYzr+XrM-rnhxMkLxg@mail.gmail.com>
Subject: Re: Question about hdfs close * hflush behavior
To: hdfs-user@hadoop.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

2011/9/7 kang hua <kanghua151@msn.com>:
> Thanks my friend!
> please allow me to ask more question about detail thinks!
> 1 yes, I can use hadoop fs -tail or -cat xxx to see that file content, Bu=
t
> how can I get that file real size in other process if namenode is not cha=
nge
> ?  I real want is to read the date in tail  of that file.

You can open the file and then use an API on the DFSInputStream class
to find the length. I don't recall the name of the API, but if you
look in there, you should see it.

>
> 2 why "when I reboot hdfs, I can see that file's content that I flushed
> again by "hadoop fs -ls xxx" "

On restart, the namenode triggers block synchronization, and the
up-to-date length is determined.

> 3 In append mode.  close file and open it with append mode again and agai=
n .
> real dataspace is normally increase, but nodename  show dfs used space
> increase to fast. it is a bug ?

Might be a bug, yes.

> 4 which version of hdfs that append is no bug ?

0.21, which is buggy in other aspects. So, no stable released version
has a working append() call.

In truth I've never seen a _good_ use case for
append-to-an-existing-file. Usually you can do just as well by keeping
the file open and periodically hflushing, or rolling to a new file
when you want to add more records to an existing dataset.

-Todd

>> From: todd@cloudera.com
>> Date: Wed, 7 Sep 2011 14:17:10 -0700
>> Subject: Re: Question about hdfs close * hflush behavior
>> To: hdfs-user@hadoop.apache.orgSend
>>
>> 2011/9/7 kang hua <kanghua151@msn.com>:
>> >
>> > Hi friends:
>> > I has two question.
>> > first one is:
>> > I use libhdfs's hflush to flush my data to a file, in same process
>> > context I can read it. But I find that file unchanged if I check from
>> > hadoop
>> > shell ---- it's len is zero( check by "hadoop fs -ls xxx" or read it i=
n
>> > program); however when I reboot hdfs, I can read that file's content
>> > that I
>> > flushed again=E3=80=82 why ?
>>
>> If we were to update th e file metadata on hflush, it would be very
>> expensive, since the metadata lives in the NameNode.
>>
>> If you do hadoop fs -cat xxx, you should see the entirety of the flushed
>> data.
>>
>> > can I hflush data to file without close it,at same time read data
>> > flushed
>> > by other process =EF=BC=9F
>>
>> yes.
>>
>
>
>
>
>
>> > second one is:
>> > does once close hdfs file, the last written block is untouched. even
>> > open
>> > that file with append mode, namenode will alloc a new block to for
>> > append
>> > data?
>>
>> No, it reopens the last block of the existing file for append.
>>
>> > I find if I close file and open it with append mode again and again.
>> > hdfs
>> > report will show "used space much more that the file logic size"
>>
>> Not sure I follow what you mean by this. Can you give more d etail?
>>
>> > btw: I use cloudera ch2
>>
>> The actual "append()" function has some bugs in all of the 0.20
>> releases, including Cloudera's. The hflush/sync() API is fine to use,
>> but I would recommend against using append().
>>
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>


--=20
Todd Lipcon
Software Engineer, Cloudera