Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <96178385A39A4027A3C3812EB8358581@china.huawei.com>
References: <AANLkTim4MMkpRgoMxTjTQusi9aXOYKT4XX1r081_8pub@mail.gmail.com>
 <96178385A39A4027A3C3812EB8358581@china.huawei.com>
From: Ted Dunning <tdunning@maprtech.com>
Date: Mon, 14 Feb 2011 08:47:43 -0800
Message-ID: <AANLkTi=R2tpbjFASV9o3=iLx9tZdN=w737pidozpnPXz@mail.gmail.com>
Subject: Re: hadoop 0.20 append - some clarifications
To: gokulm@huawei.com
Cc: common-user@hadoop.apache.org, hdfs-user@hadoop.apache.org,
	dhruba@gmail.com
Content-Type: multipart/alternative; boundary=20cf305644dfb386fe049c40cf7c

--20cf305644dfb386fe049c40cf7c
Content-Type: text/plain; charset=ISO-8859-1

HDFS definitely doesn't follow anything like POSIX file semantics.

They may be a vague inspiration for what HDFS does, but generally the
behavior of HDFS is not tightly specified.  Even the unit tests have some
real surprising behavior.

On Mon, Feb 14, 2011 at 7:21 AM, Gokulakannan M <gokulm@huawei.com> wrote:

>
>
> >> I think that in general, the behavior of any program reading data from
> an HDFS file before hsync or close is called is pretty much undefined.
>
>
>
> In Unix, users can parallelly read a file when another user is writing a
> file. And I suppose the sync feature design is based on that.
>
> So at any point of time during the file write, parallel users should be
> able to read the file.
>
>
>
>
> https://issues.apache.org/jira/browse/HDFS-142?focusedCommentId=12663958&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12663958
>  ------------------------------
>
> *From:* Ted Dunning [mailto:tdunning@maprtech.com]
> *Sent:* Friday, February 11, 2011 2:14 PM
> *To:* common-user@hadoop.apache.org; gokulm@huawei.com
> *Cc:* hdfs-user@hadoop.apache.org; dhruba@gmail.com
> *Subject:* Re: hadoop 0.20 append - some clarifications
>
>
>
> I think that in general, the behavior of any program reading data from an
> HDFS file before hsync or close is called is pretty much undefined.
>
>
>
> If you don't wait until some point were part of the file is defined, you
> can't expect any particular behavior.
>
> On Fri, Feb 11, 2011 at 12:31 AM, Gokulakannan M <gokulm@huawei.com>
> wrote:
>
> I am not concerned about the sync behavior.
>
> The thing is the reader reading non-flushed(non-synced) data from HDFS as
> you have explained in previous post.(in hadoop 0.20 append branch)
>
> I identified one specific scenario where the above statement is not holding
> true.
>
> Following is how you can reproduce the problem.
>
> 1. add debug point at createBlockOutputStream() method in DFSClient and run
> your HDFS write client in debug mode
>
> 2. allow client to write 1 block to HDFS
>
> 3. for the 2nd block, the flow will come to the debug point mentioned in
> 1(do not execute the createBlockOutputStream() method). hold here.
>
> 4. parallely, try to read the file from another client
>
> Now you will get an error saying that file cannot be read.
>
>
>
>  _____
>
> From: Ted Dunning [mailto:tdunning@maprtech.com]
> Sent: Friday, February 11, 2011 11:04 AM
> To: gokulm@huawei.com
> Cc: common-user@hadoop.apache.org; hdfs-user@hadoop.apache.org;
> cos@boudnik.org
> Subject: Re: hadoop 0.20 append - some clarifications
>
>
>
> It is a bit confusing.
>
>
>
> SequenceFile.Writer#sync isn't really sync.
>
>
>
> There is SequenceFile.Writer#syncFs which is more what you might expect to
> be sync.
>
>
>
> Then there is HADOOP-6313 which specifies hflush and hsync.  Generally, if
> you want portable code, you have to reflect a bit to figure out what can be
> done.
>
> On Thu, Feb 10, 2011 at 8:38 PM, Gokulakannan M <gokulm@huawei.com> wrote:
>
> Thanks Ted for clarifying.
>
> So the sync is to just flush the current buffers to datanode and persist
> the
> block info in namenode once per block, isn't it?
>
>
>
> Regarding reader able to see the unflushed data, I faced an issue in the
> following scneario:
>
> 1. a writer is writing a 10MB file(block size 2 MB)
>
> 2. wrote the file upto 4MB (2 finalized blocks in current and nothing in
> blocksBeingWritten directory in DN) . So 2 blocks are written
>
> 3. client calls addBlock for the 3rd block on namenode and not yet created
> outputstream to DN(or written anything to DN). At this point of time, the
> namenode knows about the 3rd block but the datanode doesn't.
>
> 4. at point 3, a reader is trying to read the file and he is getting
> exception and not able to read the file as the datanode's getBlockInfo
> returns null to the client(of course DN doesn't know about the 3rd block
> yet)
>
> In this situation the reader cannot see the file. But when the block
> writing
> is in progress , the read is successful.
>
> Is this a bug that needs to be handled in append branch?
>
>
>
> >> -----Original Message-----
> >> From: Konstantin Boudnik [mailto:cos@boudnik.org]
> >> Sent: Friday, February 11, 2011 4:09 AM
> >>To: common-user@hadoop.apache.org
> >> Subject: Re: hadoop 0.20 append - some clarifications
>
> >> You might also want to check append design doc published at HDFS-265
>
>
>
> I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's
> design doc won't apply to it.
>
>
>
>  _____
>
> From: Ted Dunning [mailto:tdunning@maprtech.com]
> Sent: Thursday, February 10, 2011 9:29 PM
> To: common-user@hadoop.apache.org; gokulm@huawei.com
> Cc: hdfs-user@hadoop.apache.org
> Subject: Re: hadoop 0.20 append - some clarifications
>
>
>
> Correct is a strong word here.
>
>
>
> There is actually an HDFS unit test that checks to see if partially written
> and unflushed data is visible.  The basic rule of thumb is that you need to
> synchronize readers and writers outside of HDFS.  There is no guarantee
> that
> data is visible or invisible after writing, but there is a guarantee that
> it
> will become visible after sync or close.
>
> On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <gokulm@huawei.com> wrote:
>
> Is this the correct behavior or my understanding is wrong?
>
>
>
>
>
>

--20cf305644dfb386fe049c40cf7c--