hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6618) Edit log corruption may still happen even after HDFS-6527
Date Tue, 01 Jul 2014 20:46:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049309#comment-14049309
] 

Kihwal Lee commented on HDFS-6618:
----------------------------------

I guess we can move it inside the first lock, since it is already holding the directory write
lock. Not many types of ops will go through anyway.  But if we remove them as we unlink inodes,
instead of building up potentially huge data structure and do it at once, it may be faster
& cheaper.

Is there a clean way to remove each inode from the inode map from {{destroyAndCollectBlocks()}}
of {{INodeFile}} and {{INodeDirectory}}?


> Edit log corruption may still happen even after HDFS-6527
> ---------------------------------------------------------
>
>                 Key: HDFS-6618
>                 URL: https://issues.apache.org/jira/browse/HDFS-6618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.5.0
>            Reporter: Kihwal Lee
>            Priority: Blocker
>         Attachments: HDFS-6618.patch
>
>
> After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters
until yesterday. Previously, we would see it within 30 minutes on a cluster.
> But the same condition was reproduced even with HDFS-6527.  The only explanation is that
the RPC handler thread serving {{addBlock()}} was accessing stale parent value.  Although
nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there
is no memory barrier because there is no "synchronized" block involved in the process.
> I suggest making parent volatile.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message