hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chia-Ping Tsai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore
Date Sun, 17 Dec 2017 09:06:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294049#comment-16294049
] 

Chia-Ping Tsai commented on HBASE-18309:
----------------------------------------

I observer the NEP in log.
{code}
2017-12-17 08:53:01,584 INFO  [6ff31ba4b7ce,35583,1513500588019_Chore_1] hbase.ScheduledChore(181):
Chore: ReplicationMetaCleaner was stopped
Exception in thread "OldWALsCleaner-1" Exception in thread "OldWALsCleaner-0" java.lang.NullPointerException
	at org.apache.hadoop.hbase.master.cleaner.LogCleaner.deleteFile(LogCleaner.java:166)
	at org.apache.hadoop.hbase.master.cleaner.LogCleaner.lambda$createOldWalsCleaner$0(LogCleaner.java:127)
	at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
	at org.apache.hadoop.hbase.master.cleaner.LogCleaner.deleteFile(LogCleaner.java:166)
	at org.apache.hadoop.hbase.master.cleaner.LogCleaner.lambda$createOldWalsCleaner$0(LogCleaner.java:127)
	at java.lang.Thread.run(Thread.java:748)
{code}

If the thread is interrupted, the context may be null. 
{code}
    while (true) {
      CleanerContext context = null;
      boolean succeed = false;
      boolean interrupted = false;
      try {
        context = pendingDelete.take();
        if (context != null) {
          FileStatus toClean = context.getTargetToClean();
          succeed = this.fs.delete(toClean.getPath(), false);
        }
      } catch (InterruptedException ite) {
        // It's most likely from configuration changing request
        if (context != null) {
          LOG.warn("Interrupted while cleaning oldWALs " +
              context.getTargetToClean() + ", try to clean it next round.");
        }
        interrupted = true;
      } catch (IOException e) {
        // fs.delete() fails.
        LOG.warn("Failed to clean oldwals with exception: " + e);
        succeed = false;
      } finally {
        context.setResult(succeed);  // here
        if (interrupted) {
          // Restore interrupt status
          Thread.currentThread().interrupt();
          break;
        }
      }
    }
{code}

> Support multi threads in CleanerChore
> -------------------------------------
>
>                 Key: HBASE-18309
>                 URL: https://issues.apache.org/jira/browse/HBASE-18309
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>            Assignee: Reid Chan
>             Fix For: 3.0.0, 2.0.0-beta-1
>
>         Attachments: HBASE-18309.master.001.patch, HBASE-18309.master.002.patch, HBASE-18309.master.004.patch,
HBASE-18309.master.005.patch, HBASE-18309.master.006.patch, HBASE-18309.master.007.patch,
HBASE-18309.master.008.patch, HBASE-18309.master.009.patch, HBASE-18309.master.010.patch,
HBASE-18309.master.011.patch, HBASE-18309.master.012.patch, space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big cluster we find
this is not enough. The number of files under oldWALs reach the max-directory-items limit
of HDFS and cause region server crash, so we use multi threads for LogCleaner and the crash
not happened any more.
> What's more, currently there's only one thread iterating the archive directory, and we
could use multiple threads cleaning sub directories in parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message