hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "fulin wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4093) When verifyAndAssignRoot throw exception, The deadServers state can not be changed.
Date Tue, 19 Jul 2011 02:34:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067457#comment-13067457
] 

fulin wang commented on HBASE-4093:
-----------------------------------

This is a protection when the system is fault state.
When is 'this.data' of blockUntilAvailable null, The verifyAndAssignRoot() would be wait.
The 'this.data' is not null, It is not wait.
This issue haapened the verifyRegionLocation() method, the exception is SocketTimeoutException,

So I think that sleep one second and retry five times, Try to handle this fault state.

> When verifyAndAssignRoot throw exception, The deadServers state can not be changed.
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-4093
>                 URL: https://issues.apache.org/jira/browse/HBASE-4093
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: fulin wang
>            Assignee: fulin wang
>         Attachments: HBASE-4093-0.90.patch, HBASE-4093-0.90_V2.patch, HBASE-4093-0.90_V3.patch,
HBASE-4093-trunk_V2.patch, HBASE-4093-trunk_V3.patch, surefire-report.html
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> When verifyAndAssignRoot throw exception, The deadServers state can not be changed.
> The Hmaster log has a lot of 'Not running balancer because processing dead regionserver(s):
[]' information.
> HMaster log:
> 2011-07-09 01:38:31,820 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed
path hdfs://162.2.16.6:9000/hbase/Htable_UFDR_035/fe7e51c0a74fac096cea8cdb3c9497a6/recovered.edits/0000000000204525422
(wrote 8 edits in 61583ms)
> 2011-07-09 01:38:31,836 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed
splitting hdfs://162.2.16.6:9000/hbase/.logs/162-2-6-187,20020,1310107719056
> java.io.IOException: hdfs://162.2.16.6:9000/hbase/.logs/162-2-6-187,20020,1310107719056/162-2-6-187%3A20020.1310143885352,
entryStart=1878997244, pos=1879048192, end=2003890606, edit=80274
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:244)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:200)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:172)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.parseHLog(HLogSplitter.java:429)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:262)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188)
> 	at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:201)
> 	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:114)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Could not obtain block: blk_1310107715558_225636 file=/hbase/.logs/162-2-6-187,20020,1310107719056/162-2-6-187%3A20020.1310143885352
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2491)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2256)
> 	at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2441)
> 	at java.io.DataInputStream.read(DataInputStream.java:132)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:178)
> 	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
> 	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1984)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1884)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
> 	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:198)
> 	... 10 more
> 2011-07-09 01:38:33,052 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer
because processing dead regionserver(s): [162-2-6-187,20020,1310107719056]
> 2011-07-09 01:39:29,946 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Failed scan
of catalog table
> java.net.SocketTimeoutException: Call to /162.2.6.187:20020 failed on socket timeout
exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/162.2.6.187:38721
remote=/162.2.6.187:20020]
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> 	at $Proxy6.getRegionInfo(Unknown Source)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:272)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:331)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:364)
> 	at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255)
> 	at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:237)
> 	at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:116)
> 	at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:85)
> 	at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/162.2.6.187:38721
remote=/162.2.6.187:20020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:116)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 	at java.io.DataInputStream.readInt(DataInputStream.java:370)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
> 2011-07-09 01:39:29,946 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable
while processing event M_META_SERVER_SHUTDOWN
> java.net.SocketTimeoutException: Call to /162.2.6.187:20020 failed on socket timeout
exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/162.2.6.187:38721
remote=/162.2.6.187:20020]
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
> 	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
> 	at $Proxy6.getRegionInfo(Unknown Source)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:424)
> 	at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:471)
> 	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:90)
> 	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:126)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/162.2.6.187:38721
remote=/162.2.6.187:20020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:116)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 	at java.io.DataInputStream.readInt(DataInputStream.java:370)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
> 	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
> 2011-07-09 01:40:26,474 DEBUG org.apache.hadoop.hbase.master.ServerManager: Server 162-2-6-187,20020,1310146825674
came back up, removed it from the dead servers list
> 2011-07-09 01:40:26,515 INFO org.apache.hadoop.hbase.master.ServerManager: Registering
server=162-2-6-187,20020,1310146825674, regionCount=0, userLoad=false
> 2011-07-09 01:40:28,410 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification
of .META.,,1 at address=162-2-6-187:20020; org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online: .META.,,1
> ...
> 2011-07-09 01:53:33,052 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer
because processing dead regionserver(s): []
> 2011-07-09 01:58:33,060 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer
because processing dead regionserver(s): []
> 2011-07-09 02:03:33,061 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer
because processing dead regionserver(s): []
> 2011-07-09 02:08:33,061 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer
because processing dead regionserver(s): []

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message