hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5367) Restore fsimage locked NameNode too long when the size of fsimage are big
Date Thu, 17 Oct 2013 00:36:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797476#comment-13797476

Suresh Srinivas commented on HDFS-5367:

I agree that when storage directories are being restored during rollEditLog, saving fsimage
that will soon be replaced by new checkpointed fsimage seems unnecessary.

+1 for the patch. I will commit it soon to branch-1.

bq. John , could you please provide a patch for trunk as well ?
Trunk is a lot different from branch-1. Let me know if you need help. Based on the analysis,
this change may not be needed on trunk.

> Restore fsimage locked NameNode too long when the size of fsimage are big
> -------------------------------------------------------------------------
>                 Key: HDFS-5367
>                 URL: https://issues.apache.org/jira/browse/HDFS-5367
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: zhaoyunjiong
>            Assignee: zhaoyunjiong
>         Attachments: HDFS-5367-branch-1.2.patch
> Our cluster have 40G fsimage, we write one copy of edit log to NFS.
> After NFS temporary failed, when doing checkpoint, NameNode try to recover it, and it
will save 40G fsimage to NFS, it takes some time (> 40G/128MB/s = 320 seconds) , and it
locked FSNamesystem, and this bring down our cluster.

This message was sent by Atlassian JIRA

View raw message