hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-3889) NPE in Distributed Log Splitting
Date Mon, 16 May 2011 12:14:47 GMT
NPE in Distributed Log Splitting
--------------------------------

                 Key: HBASE-3889
                 URL: https://issues.apache.org/jira/browse/HBASE-3889
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.92.0
         Environment: Pseudo-distributed on MacOS
            Reporter: Lars George
             Fix For: 0.92.0


There is an issue with the log splitting under the specific condition of edits belonging to
a non existing region (which went away after a split for example). The HLogSplitter fails
to check the condition, which is handled on a lower level, logging manifests it as 

{noformat}
2011-05-16 13:56:10,300 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: This region's
directory doesn't exist: hdfs://localhost:8020/hbase/usertable/30c4d0a47703214845d0676d0c7b36f0.
It is very likely that it was already split so it's safe to discard those edits.
{noformat}

The code returns a null reference which is not check in HLogSplitter.splitLogFileToTemp():

{code}
...
        WriterAndPath wap = (WriterAndPath)o;
        if (wap == null) {
          wap = createWAP(region, entry, rootDir, tmpname, fs, conf);
          if (wap == null) {
            logWriters.put(region, BAD_WRITER);
          } else {
            logWriters.put(region, wap);
          }
        }
        wap.w.append(entry);
...
{code}

The createWAP does return "null" when the above message is logged based on the obsolete region
reference in the edit.

What made this difficult to detect is that the error (and others) are silently ignored in
SplitLogWorker.grabTask(). I added a catch and error logging to see the NPE that was caused
by the above.

{code}
...
          break;
      }
    } catch (Exception e) {
      LOG.error("An error occurred.", e);
    } finally {
      if (t > 0) {
...
{code}

As a side note, there are other errors/asserts triggered that this try/finally not handles.
For example

{noformat}
2011-05-16 13:58:30,647 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: BADVERSION
failed to assert ownership for /hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for
/hbase/splitlog/hdfs%3A%2F%2Flocalhost%2Fhbase%2F.logs%2F10.0.0.65%2C60020%2C1305406356765%2F10.0.0.65%252C60020%252C1305406356765.1305409968389
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.ownTask(SplitLogWorker.java:329)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.access$100(SplitLogWorker.java:68)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$2.progress(SplitLogWorker.java:265)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:432)
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:354)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:260)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:191)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:164)
        at java.lang.Thread.run(Thread.java:680)
{noformat}

This should probably be handled - or at least documented - in another issue?

The NPE made the log split end and the SplitLogManager add an endless amount of RESCAN entries
as this never came to an end.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message