hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lohit Vijayarenu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6248) SNN crash during replay of FSEditLog of files inside directories having QuotaExceeded directories
Date Wed, 16 Apr 2014 02:50:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970372#comment-13970372
] 

Lohit Vijayarenu commented on HDFS-6248:
----------------------------------------

One corner case I suspect is QuotaCheck done FSDirectory::addChild in active vs standby namenodes.
When a file is created by active namenode and synced to edits, active NN's quota check might
be close to its max, by the time standby NN replays this edit log space quota could have increased
because of other files in a directory and valid edit log might hit QuotaExceededException.
I feel when Standby namenode replays edits, it should ignore quota check since it is already
controlled by Active Namenode anyways. This should solve the race condition and prevent Standby
namenode from crashing. What do other think about this approach?

> SNN crash during replay of FSEditLog of files inside directories having QuotaExceeded
directories 
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6248
>                 URL: https://issues.apache.org/jira/browse/HDFS-6248
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.6-alpha, 2.4.0
>         Environment: NameNode HA setup with Active/Standby using QJM
>            Reporter: Lohit Vijayarenu
>
> We are seeing cases when Secondary NameNode crashes without recovery when it tries to
replay edit log of files which are part of directories which have exceeded Quota. While debugging
we got stack trace but we are still trying to reproduce this and wanted to note this to see
if anyone else had seen this issue already. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message