hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation
Date Tue, 17 Mar 2015 18:51:39 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-7587:
----------------------------
    Attachment: HDFS-7587.001.patch

Rebase Daryn's patch. Also make changes based on Nicholas's comments, i.e., first verifying
the quota and updating the quota after the action.

With fix from HDFS-7943 we will not have blocks with size greater than the preferred block
size. Thus we can avoid "earning back" quota scenarios.

Truncate may have similar issue when the data to truncate is only part of the original last
block. Will update the patch later to fix this part.



> Edit log corruption can happen if append fails with a quota violation
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7587
>                 URL: https://issues.apache.org/jira/browse/HDFS-7587
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-7587.001.patch, HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was complaining
that {{OP_CLOSE}} cannot be applied because the file is not under-construction.
> When a client was trying to append to the file, the remaining space quota was very small.
This caused a failure in {{prepareFileForWrite()}}, but after the inode was already converted
for writing and a lease added. Since these were not undone when the quota violation was detected,
the file was left in under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft limit period.
This resulted in {{commitBlockSynchronization()}}, which closed the file with {{OP_CLOSE}}
being logged.  Since there was no corresponding {{OP_ADD}}, edit replaying could not apply
this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message