hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabor Bota (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode
Date Tue, 24 Jul 2018 15:06:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554351#comment-16554351
] 

Gabor Bota commented on HDFS-13672:
-----------------------------------

Adding lazy persist would be a great idea. [~xiaochen] if you can create a follow-up jira
if you want to.
Another good idea is to create a service where you can iterate through a list of elements
with a gained writeLock - and each element can be run through a lambda function. We may want
to create a jira for that. (kudos for [~andrew.wang])

So to summarize this issue:
* It's not worth making a behavior change for this since this long blocking scan probably
will only happen during debugging situations (and we have a workaround)
* The workaround is to disable the scrubber interval when debugging. In the real world/customer
environments, there are no cases when there are so many corrupted lazy persist files.
* I will close this jira as won't fix and if there's no follow-up jiras, I'll create one tomorrow
CET.

> clearCorruptLazyPersistFiles could crash NameNode
> -------------------------------------------------
>
>                 Key: HDFS-13672
>                 URL: https://issues.apache.org/jira/browse/HDFS-13672
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Wei-Chiu Chuang
>            Assignee: Gabor Bota
>            Priority: Major
>         Attachments: HDFS-13672.001.patch, HDFS-13672.002.patch, HDFS-13672.003.patch
>
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started without
any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write lock for
a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held for 46024
ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
>         Number of suppressed write-lock reports: 0
>         Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>       writeLock();
>       try {
>         final Iterator<BlockInfo> it =
>             blockManager.getCorruptReplicaBlockIterator();
>         while (it.hasNext()) {
>           Block b = it.next();
>           BlockInfo blockInfo = blockManager.getStoredBlock(b);
>           if (blockInfo.getBlockCollection().getStoragePolicyID() == lpPolicy.getId())
{
>             filesToDelete.add(blockInfo.getBlockCollection());
>           }
>         }
>         for (BlockCollection bc : filesToDelete) {
>           LOG.warn("Removing lazyPersist file " + bc.getName() + " with no replicas.");
>           changed |= deleteInternal(bc.getName(), false, false, false);
>         }
>       } finally {
>         writeUnlock();
>       }
> {code}
> In essence, the iteration over corrupt replica list should be broken down into smaller
iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the default
ZKFC connection timeout, it implies an extreme case like this (100 million corrupt blocks)
could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message