hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2283) [hbase] Stuck replay of failed regionserver edits
Date Tue, 27 Nov 2007 21:08:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546003
] 

stack commented on HADOOP-2283:
-------------------------------

Also seeing this compacting:

{code}
2007-11-27 04:11:04,173 DEBUG hbase.HStore - started compaction of 4 files in /hbase/compaction.dir/hregion_-1572125711/cookie
2007-11-27 04:11:04,193 DEBUG fs.DFSClient - Failed to connect to /38.99.76.30:50010:java.io.IOException:
Got error in response to OP_READ_BLOCK
	at org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1027)
	at java.io.FilterInputStream.read(FilterInputStream.java:66)
	at java.io.DataInputStream.readByte(DataInputStream.java:248)
	at org.apache.hadoop.hbase.HStoreFile.loadInfo(HStoreFile.java:590)
	at org.apache.hadoop.hbase.HStore.compact(HStore.java:1004)
	at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
	at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
	at org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)
{code}

Nothing in namenode log about OP_READ_BLOCK complaint or even errors other than a few of these:
{code}
2007-11-27 01:29:00,226 WARN  dfs.FSNamesystem - java.io.IOException: Namenode is not expecting
an new image UPLOAD_START
2007-11-27 01:29:00,496 WARN  dfs.FSNamesystem - java.io.IOException: Namenode is not expecting
an new image UPLOAD_START
{code}

> [hbase] Stuck replay of failed regionserver edits
> -------------------------------------------------
>
>                 Key: HADOOP-2283
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2283
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>
> Looking in master for a cluster of ~90 regionservers, the regionserver carrying the ROOT
went down (because it hadn't talked to the master in 30 seconds).
> Master notices the downed regionserver because its lease timesout. It then goes to run
the shutdown server sequence only splitting the regionserver's edit log, it gets stuck trying
to split the second of three log files. Eventually, after ~5minutes, the second log split
throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending operations: ProcessServerShutdown
of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: org.apache.hadoop.dfs.AlreadyBeingCreatedException:
failed to create file /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on
client XX.XX.XX.XX because curren        t leaseholder is trying to recreate file.
>   34976     at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977     at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978     at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979     at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981     at java.lang.reflect.Method.invoke(Method.java:597)
>   34982     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   34986     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989     at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990     at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are stuck in
this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message