hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dlmarion <dlmar...@comcast.net>
Subject Re: missing data blocks after active name node crashes
Date Wed, 11 Feb 2015 13:03:33 GMT

<div>-------- Original message --------</div><div>From: Chen Song <chen.song.82@gmail.com>
</div><div>Date:02/11/2015  7:48 AM  (GMT-05:00) </div><div>To: user@hadoop.apache.org
</div><div>Cc:  </div><div>Subject: Re: missing data blocks after
active name node crashes </div><div>
</div>Thanks David.

Do you have the relative Jira ticket number handy?


On Tue, Feb 10, 2015 at 5:54 PM, david marion <dlmarion@hotmail.com> wrote:
I believe therr was an issue fixed in 2.5 or 2.6 where the standby NN would not process block
reports from the DNs when it was dealing with the checkpoint process. The missing blocks will
get reported eventually.

-------- Original message --------
From: Chen Song <chen.song.82@gmail.com>
Date:02/10/2015 2:44 PM (GMT-05:00)
To: user@hadoop.apache.org, Ravi Prakash <ravihoo@ymail.com>
Subject: Re: missing data blocks after active name node crashes

Thanks for the reply, Ravi.

In my case, what I see constantly is there are always missing blocks every time active name
node crashes. The active name node crashes because of timeout on journal nodes.

Could this be a specific case which could lead to missing blocks?


On Tue, Feb 10, 2015 at 2:20 PM, Ravi Prakash <ravihoo@ymail.com> wrote:
Hi Chen!

From my understanding, every operation on the Namenode is logged (and flushed) to disk / QJM
/ shared storage. This includes the addBlock operation. So when a client requests to write
a new block, the metadata is logged by the active NN, so even if it crashes later on, the
new active NN would still see the creation of the block.


On Tuesday, February 10, 2015 9:38 AM, Chen Song <chen.song.82@gmail.com> wrote:

When the active name node crashes, it seems there is always a chance that the data blocks
in flight will be missing.
My understanding is that when the active name node crashes, the metadata of data blocks in
transition which exist in active name node memory is not successfully captured by journal
nodes and thus not available on standby name node when it is promoted to active by zkfc.
Is my understanding correct? Any way to mitigate this problem or race condition?

Chen Song

Chen Song

Chen Song

View raw message