hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-11813) CellScanner#advance may infinitely recurse
Date Sat, 23 Aug 2014 19:44:10 GMT
Andrew Purtell created HBASE-11813:
--------------------------------------

             Summary: CellScanner#advance may infinitely recurse
                 Key: HBASE-11813
                 URL: https://issues.apache.org/jira/browse/HBASE-11813
             Project: HBase
          Issue Type: Bug
            Reporter: Andrew Purtell
            Priority: Blocker
             Fix For: 0.99.0, 2.0.0, 0.98.6


On user@hbase, johannes.schaback@visual-meta.com reported:
{quote}
we face a serious issue with our HBase production cluster for two days now. Every couple minutes,
a random RegionServer gets stuck and does not process any requests. In addition this causes
the other RegionServers to freeze within a minute which brings down the entire cluster. Stopping
the affected RegionServer unblocks the cluster and everything comes back to normal.
{quote}

Subsequent troubleshooting reveals that RPC is getting stuck because we losing RPC handlers.
In the .out files we have this:
{noformat}
Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
java.lang.StackOverflowError
        at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
        at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
        at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
        at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
[...]
Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
java.lang.StackOverflowError
Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
java.lang.StackOverflowError​
{noformat}

That is the anonymous CellScanner instance we create from CellUtil#createCellScanner:
{code}
​    return new CellScanner() {
      private final Iterator<? extends CellScannable> iterator = cellScannerabl\
es.iterator();
      private CellScanner cellScanner = null;

      @Override
      public Cell current() {
        return this.cellScanner != null? this.cellScanner.current(): null;
      }

      @Override
      public boolean advance() throws IOException {
        if (this.cellScanner == null) {
          if (!this.iterator.hasNext()) return false;
          this.cellScanner = this.iterator.next().cellScanner();
        }
        if (this.cellScanner.advance()) return true;
        this.cellScanner = null;
--->        return advance();
      }
    };
{code}

That final return statement is the immediate problem.

We should also fix this so the RegionServer aborts if it loses a handler to an Error. 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message