hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-7513) HDFSBlocksDistribution shouldn't send NPEs when something goes wrong
Date Mon, 07 Jan 2013 21:48:13 GMT
Jean-Daniel Cryans created HBASE-7513:
-----------------------------------------

             Summary: HDFSBlocksDistribution shouldn't send NPEs when something goes wrong
                 Key: HBASE-7513
                 URL: https://issues.apache.org/jira/browse/HBASE-7513
             Project: HBase
          Issue Type: Bug
            Reporter: Jean-Daniel Cryans
            Priority: Minor
             Fix For: 0.96.0


I saw a pretty weird failure on a cluster with corrupted files and this particular exception
really threw me off:

{noformat}
2013-01-07 09:58:59,054 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Failed open of region=redacted., starting to roll back the global memstore size.
java.io.IOException: java.io.IOException: java.lang.NullPointerException: empty hosts
	at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:548)
	at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:461)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3814)
	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3762)
	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332)
	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.lang.NullPointerException: empty hosts
	at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:403)
	at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:256)
	at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2995)
	at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:523)
	at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:521)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	... 3 more
Caused by: java.lang.NullPointerException: empty hosts
	at org.apache.hadoop.hbase.HDFSBlocksDistribution.addHostsAndBlockWeight(HDFSBlocksDistribution.java:123)
	at org.apache.hadoop.hbase.util.FSUtils.computeHDFSBlocksDistribution(FSUtils.java:597)
	at org.apache.hadoop.hbase.regionserver.StoreFile.computeHDFSBlockDistribution(StoreFile.java:492)
	at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:521)
	at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:602)
	at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:380)
	at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:375)
	... 8 more
2013-01-07 09:58:59,059 INFO org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
Opening of region "redacted" failed, marking as FAILED_OPEN in ZK
{noformat}

This is what the code looks like:

{code}
if (hosts == null || hosts.length == 0) {
 throw new NullPointerException("empty hosts");
}
{code}

So {{hosts}} can exist but we send an NPE anyways? And then this is wrapped in {{Store}} by:

{code}
} catch (ExecutionException e) {
  throw new IOException(e.getCause());
{code}

FWIW there's another NPE thrown in {{HDFSBlocksDistribution.addHostAndBlockWeight}} and it
looks wrong.

We should change the code to just skip computing the locality if it's missing and not throw
big ugly exceptions. In this case the region would fail opening later anyways but at least
the error message will be clearer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message