hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Master died after failed assignment of regionserver
Date Tue, 28 Jun 2011 21:14:56 GMT
Looks like I'll be fixing it in the context of HBASE-3984.

J-D

On Mon, Jun 27, 2011 at 10:21 AM, Jean-Daniel Cryans
<jdcryans@apache.org> wrote:
> Yeah the BulkAssigner uses a AssignmentManager.assign method that
> won't retry if it gets an exception because originally it was only
> used during startup (eg if you can't assign the regions properly at
> startup then kill yourself).
>
> In this case it happened to you when creating a table (it also uses
> the BulkAssigner now). To me it's a bug, it should just be assigned
> somewhere else. Mind filing a jira?
>
> As to how you got a 60 secs timeout, you should take a look at the RS
> log (and next time it happens also do a jstack).
>
> Thx,
>
> J-D
>
> On Mon, Jun 27, 2011 at 9:54 AM, Matt Davies <matt.davies@tynt.com> wrote:
>> Hey all,
>>
>> Interesting problem I'm running into.  It appears that our master has died
>> after a failed region assignment.  Any ideas as to why?
>>
>>
>> We are running version 0.90.1 in this cluster.
>> H3M1 = master
>> H3S* = regionservers
>>
>> out on H3S3 the regionserver process is running (but didn't respond per the
>> log)
>>
>> 2011-06-27 10:18:22,517 INFO
>> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
>> disable table ContentGuidance
>> 2011-06-27 10:18:22,522 INFO
>> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
>> regions.
>> 2011-06-27 10:18:23,527 INFO
>> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
>> is done=true
>> 2011-06-27 10:18:24,536 INFO
>> org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
>> operation C_M_DELETE_TABLE on table ContentGuidance
>> 2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
>> Deleted region
>> ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from META
>> 2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable:
>> Moving table ContentGuidance state to enabled but was already enabled
>> 2011-06-27 10:19:17,313 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
>> HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true,
>> flushlogentries=1, optionallogflushinternal=1000ms
>> 2011-06-27 10:19:17,349 INFO
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
>> -- HDFS-200
>> 2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
>> New hlog
>> /hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.1309191557313
>> 2011-06-27 10:19:17,350 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
>> Using getNumCurrentReplicas--HDFS-826
>> 2011-06-27 10:19:17,433 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> Onlined ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.;
>> next sequenceid=1
>> 2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
>> Added region
>> ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META
>> 2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>> Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.
>> 2011-06-27 10:19:17,439 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
>> IPC Server handler 87 on 60000.logSyncer exiting
>> 2011-06-27 10:19:17,470 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1 region(s)
>> round-robin across 4 server(s)
>> 2011-06-27 10:19:17,472 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
>> 2011-06-27 10:19:17,480 INFO
>> org.apache.hadoop.hbase.master.AssignmentManager: H3S4,60020,1308946657703
>> unassigned znodes=1 of total=1
>> 2011-06-27 10:19:21,530 INFO
>> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
>> disable table ContentGuidance
>> 2011-06-27 10:19:21,536 INFO
>> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1
>> regions.
>> 2011-06-27 10:19:22,543 INFO
>> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table
>> is done=true
>> 2011-06-27 10:19:23,544 INFO
>> org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
>> operation C_M_DELETE_TABLE on table ContentGuidance
>> 2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
>> Deleted region
>> ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from META
>> 2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster: Failed
>> assignment of regions to serverName=H3S3,60020,1308946657608,
>> load=(requests=0, regions=7, usedHeap=976, maxHeap=1011)
>> java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on
>> socket timeout exception: java.net.SocketTimeoutException: 60000 millis
>> timeout while waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
>> remote=H3S3/10.x.x.x:60020]
>> at
>> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802)
>> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
>> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>> at $Proxy7.openRegions(Unknown Source)
>> at
>> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566)
>> at
>> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776)
>> at
>> org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904
>> remote=H3S3/10..x.x.x:60020]
>> at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>> at java.io.FilterInputStream.read(FilterInputStream.java:116)
>> at
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299)
>> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>> at java.io.DataInputStream.readInt(DataInputStream.java:370)
>> at
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
>> at
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
>> 2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster:
>> Aborting
>> 2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
>> server on 60000
>> 2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
>> H3M1:60000-CatalogJanitor exiting
>> 2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1:
>> H3M1:60000-BalancerChore exiting
>> 2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
>> Server listener on 60000
>>
>

Mime
View raw message