hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7609) startup used too much time to load edits
Date Thu, 21 May 2015 20:40:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555003#comment-14555003
] 

Ming Ma commented on HDFS-7609:
-------------------------------

Thanks, [~jingzhao]. In the scenario you described, IIUC, in order for #6 the call to block
on {{FSNamesystem#delete}}, it will first need to pass {{checkOperation(OperationCategory.WRITE)}}.
But given the new ANN hasn't transitioned to active yet, the call should have received StandbyException
already. Regarding throwing StandbyException earlier, we can add it to NameNodeRpcServer;
but it seems unnecessary. Suggestions?

> startup used too much time to load edits
> ----------------------------------------
>
>                 Key: HDFS-7609
>                 URL: https://issues.apache.org/jira/browse/HDFS-7609
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.2.0
>            Reporter: Carrey Zhan
>            Assignee: Ming Ma
>              Labels: BB2015-05-RFC
>         Attachments: HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch
>
>
> One day my namenode crashed because of two journal node timed out at the same time under
very high load, leaving behind about 100 million transactions in edits log.(I still have no
idea why they were not rolled into fsimage.)
> I tryed to restart namenode, but it showed that almost 20 hours would be needed before
finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover
mode, the loading speed had no different.
> I looked into the stack trace, judged that it is caused by the retry cache. So I set
dfs.namenode.enable.retrycache to false, the restart process finished in half an hour.
> I think the retry cached is useless during startup, at least during recover process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message