hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gokulakannan M <gok...@huawei.com>
Subject RE: hadoop 0.20 append - some clarifications
Date Fri, 11 Feb 2011 04:38:20 GMT
Thanks Ted for clarifying.

So the sync is to just flush the current buffers to datanode and persist the
block info in namenode once per block, isn't it?

 

Regarding reader able to see the unflushed data, I faced an issue in the
following scneario:

1. a writer is writing a 10MB file(block size 2 MB) 

2. wrote the file upto 4MB (2 finalized blocks in current and nothing in
blocksBeingWritten directory in DN) . So 2 blocks are written 

3. client calls addBlock for the 3rd block on namenode and not yet created
outputstream to DN(or written anything to DN). At this point of time, the
namenode knows about the 3rd block but the datanode doesn't.

4. at point 3, a reader is trying to read the file and he is getting
exception and not able to read the file as the datanode's getBlockInfo
returns null to the client(of course DN doesn't know about the 3rd block
yet)

In this situation the reader cannot see the file. But when the block writing
is in progress , the read is successful. 

Is this a bug that needs to be handled in append branch?

 

>> -----Original Message-----
>> From: Konstantin Boudnik [mailto:cos@boudnik.org] 
>> Sent: Friday, February 11, 2011 4:09 AM
>>To: common-user@hadoop.apache.org
>> Subject: Re: hadoop 0.20 append - some clarifications

>> You might also want to check append design doc published at HDFS-265

 

I was asking about the hadoop 0.20 append branch. I suppose HDFS-265's
design doc won't apply to it.

 

  _____  

From: Ted Dunning [mailto:tdunning@maprtech.com] 
Sent: Thursday, February 10, 2011 9:29 PM
To: common-user@hadoop.apache.org; gokulm@huawei.com
Cc: hdfs-user@hadoop.apache.org
Subject: Re: hadoop 0.20 append - some clarifications

 

Correct is a strong word here.

 

There is actually an HDFS unit test that checks to see if partially written
and unflushed data is visible.  The basic rule of thumb is that you need to
synchronize readers and writers outside of HDFS.  There is no guarantee that
data is visible or invisible after writing, but there is a guarantee that it
will become visible after sync or close.

On Thu, Feb 10, 2011 at 7:11 AM, Gokulakannan M <gokulm@huawei.com> wrote:

Is this the correct behavior or my understanding is wrong?

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message