hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: single RegionServer stuck, causing cluster to hang
Date Sat, 23 Aug 2014 18:28:08 GMT
Anything in your .out that could help explain our losing handlers if you
can't find anything in the logs?

You did the 'snipp' in the below, right Johannes?

RS Configuration:
===========================================================
<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration>
[snipp] no fancy stuff, all default, except absolute necessary settings
[snipp]
</configuration>

As per Qian, if hbase.ipc.server.callqueue.handler.factor, that could help
explain why we have handlers but they are not 'taking' from the call queue,
they are stuck taking on those queues that do not have calls queued.

St.Ack







On Sat, Aug 23, 2014 at 2:56 AM, Qiang Tian <tianq01@gmail.com> wrote:

> Did you set hbase.ipc.server.callqueue.handler.factor?
> it looks there are 3 queues, handlers on queue 1 are all gone as Stack
> mentioned.  jstack and pastebin regions server log would help.
>
>
>
>
>
> On Sat, Aug 23, 2014 at 7:02 AM, Stack <stack@duboce.net> wrote:
>
> > On Fri, Aug 22, 2014 at 3:24 PM, Johannes Schaback <
> > johannes.schaback@visual-meta.com> wrote:
> >
> > > ...
> > > I grep'ed "defaultRpcServer.handler=" on the log from that particular
> RS.
> > > The
> > > RS started at 15:35. After that, the handlers
> > >
> > > 6, 24, 0, 15, 28, 26, 7, 19, 21, 3, 5 and 23
> > >
> > > make an appearance in error messages of HDFS related exceptions:
> > >
> > > 2014-08-22 13:54:57,470 WARN
> > >  [defaultRpcServer.handler=6,queue=0,port=60020]
> hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 13:54:57,472 WARN
> > >  [defaultRpcServer.handler=6,queue=0,port=60020] hdfs.DFSClient:
> > Connection
> > > failure: Failed to connect to /192.168.3.233:50010 for file
> > > /hbase/data/default/image/dae1b8e3bfdb608571d09916bf0f
> > >  a156/cf/866a773857654b0d83275dc4e4558be6 for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294949058_255264636:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:37835, remote=/
> > > 192.168.3.233:50      010, for file
> > >
> > >
> >
> /hbase/data/default/image/dae1b8e3bfdb608571d09916bf0fa156/cf/866a773857654b0d83275dc4e4558be6,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294949058_255264636
> > > 2014-08-22 13:56:58,525 WARN
> > >  [defaultRpcServer.handler=24,queue=0,port=60020]
> > hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 13:56:58,799 WARN
> > >  [defaultRpcServer.handler=24,queue=0,port=60020] hdfs.DFSClient:
> > > Connection failure: Failed to connect to /192.168.3.142:50010 for file
> > > /hbase/data/default/image/cf493cbd9921ae6ca5e5281cc07
> > >  18ca2/cf/7f9618dcdeae40ddbff21165d08e0a83 for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294072754_254292863:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:59268, remote=/
> > > 192.168.3.142:5      0010, for file
> > >
> > >
> >
> /hbase/data/default/image/cf493cbd9921ae6ca5e5281cc0718ca2/cf/7f9618dcdeae40ddbff21165d08e0a83,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294072754_254292863
> > > 2014-08-22 14:08:01,632 WARN
> > >  [defaultRpcServer.handler=0,queue=0,port=60020]
> hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:08:01,633 WARN
> > >  [defaultRpcServer.handler=0,queue=0,port=60020] hdfs.DFSClient:
> > Connection
> > > failure: Failed to connect to /192.168.3.53:50010 for file
> > > /hbase/data/default/image/af02bc7fb404f4c054dcd64b44b0e
> > >  2a9/cf/a8881b6e2d5d41b3b56fd34fd4ca8ffd for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294876123_255182439:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:60737, remote=/
> > > 192.168.3.53:5001      0, for file
> > >
> > >
> >
> /hbase/data/default/image/af02bc7fb404f4c054dcd64b44b0e2a9/cf/a8881b6e2d5d41b3b56fd34fd4ca8ffd,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294876123_255182439
> > > 2014-08-22 14:11:34,192 WARN
> > >  [defaultRpcServer.handler=15,queue=0,port=60020] ipc.RpcServer:
> > > (responseTooSlow):
> > >
> > >
> >
> {"processingtimems":15432,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiReque
> > >      st)","client":"192.168.3.54:52838
> > >
> > >
> >
> ","starttimems":1408709478713,"queuetimems":0,"class":"HRegionServer","responsesize":99638,"method":"Multi"}
> > > 2014-08-22 14:22:21,847 WARN
> > >  [defaultRpcServer.handler=28,queue=1,port=60020]
> > hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:22:21,848 WARN
> > >  [defaultRpcServer.handler=28,queue=1,port=60020] hdfs.DFSClient:
> > > Connection failure: Failed to connect to /192.168.3.233:50010 for file
> > > /hbase/data/default/image/1d33a251862055cd999078bbd10
> > >  aa44c/cf/a7470351725f4cc192b6210bac9b7c44 for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294949316_255264902:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:44787, remote=/
> > > 192.168.3.233:5      0010, for file
> > >
> > >
> >
> /hbase/data/default/image/1d33a251862055cd999078bbd10aa44c/cf/a7470351725f4cc192b6210bac9b7c44,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294949316_255264902
> > > 2014-08-22 14:23:22,628 WARN
> > >  [defaultRpcServer.handler=26,queue=2,port=60020]
> > hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:23:22,628 WARN
> > >  [defaultRpcServer.handler=26,queue=2,port=60020] hdfs.DFSClient:
> > > Connection failure: Failed to connect to /192.168.3.142:50010 for file
> > > /hbase/data/default/image/4a9c830a4fe006f0a6af7418164
> > >  dd86d/cf/47d6f2c3f7054d40aed66cae9787c464 for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1293459594_253612452:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:37660, remote=/
> > > 192.168.3.142:5      0010, for file
> > >
> > >
> >
> /hbase/data/default/image/4a9c830a4fe006f0a6af7418164dd86d/cf/47d6f2c3f7054d40aed66cae9787c464,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1293459594_253612452
> > > 2014-08-22 14:25:35,003 WARN
> > >  [defaultRpcServer.handler=7,queue=1,port=60020]
> hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:25:35,004 WARN
> > >  [defaultRpcServer.handler=7,queue=1,port=60020] hdfs.DFSClient:
> > Connection
> > > failure: Failed to connect to /192.168.3.51:50010 for file
> > > /hbase/data/default/run_automaton_cache/da03f8123004be3
> > >  2659e1c8a51afbbf8/cf/1daf234e740f4b00889bb60e574dc79b for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294614349_254896345:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:53627, remote=/192
> > >  .168.3.51:50010, for file
> > >
> > >
> >
> /hbase/data/default/run_automaton_cache/da03f8123004be32659e1c8a51afbbf8/cf/1daf234e740f4b00889bb60e574dc79b,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294614349_254896345
> > > 2014-08-22 14:25:46,831 WARN
> > >  [defaultRpcServer.handler=19,queue=1,port=60020]
> > hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:25:46,832 WARN
> > >  [defaultRpcServer.handler=19,queue=1,port=60020] hdfs.DFSClient:
> > > Connection failure: Failed to connect to /192.168.3.30:50010 for file
> > > /hbase/data/default/image/1c3bab43e260ddb46a06cf04e293
> > >  e386/cf/740f8240ac9a4abc9e5fcf6ec7df18bc for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294151564_254380545:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:55660, remote=/
> > > 192.168.3.30:500      10, for file
> > >
> > >
> >
> /hbase/data/default/image/1c3bab43e260ddb46a06cf04e293e386/cf/740f8240ac9a4abc9e5fcf6ec7df18bc,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294151564_254380545
> > > 2014-08-22 14:28:22,395 WARN
> > >  [defaultRpcServer.handler=26,queue=2,port=60020]
> > hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:28:22,397 WARN
> > >  [defaultRpcServer.handler=26,queue=2,port=60020] hdfs.DFSClient:
> > > Connection failure: Failed to connect to /192.168.3.143:50010 for file
> > > /hbase/data/default/image/b4b48e20a606c431e393b674f92
> > >  79daf/cf/76ff3628306a48f187a285b2a21d9ac9 for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294308080_254554506:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:51759, remote=/
> > > 192.168.3.143:5      0010, for file
> > >
> > >
> >
> /hbase/data/default/image/b4b48e20a606c431e393b674f9279daf/cf/76ff3628306a48f187a285b2a21d9ac9,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294308080_254554506
> > > 2014-08-22 14:30:29,395 WARN
> > >  [defaultRpcServer.handler=21,queue=0,port=60020]
> > hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:30:29,396 WARN
> > >  [defaultRpcServer.handler=21,queue=0,port=60020] hdfs.DFSClient:
> > > Connection failure: Failed to connect to /192.168.3.30:50010 for file
> > > /hbase/data/default/image/77ceb80d73c065f1a4ba2ad3b7cf
> > >  04a2/cf/151c244177ac42b5996af0c9052660cc for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294365296_254618150:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:57238, remote=/
> > > 192.168.3.30:500      10, for file
> > >
> > >
> >
> /hbase/data/default/image/77ceb80d73c065f1a4ba2ad3b7cf04a2/cf/151c244177ac42b5996af0c9052660cc,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294365296_254618150
> > > 2014-08-22 14:31:34,016 WARN
> > >  [defaultRpcServer.handler=6,queue=0,port=60020]
> hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:31:34,018 WARN
> > >  [defaultRpcServer.handler=6,queue=0,port=60020] hdfs.DFSClient:
> > Connection
> > > failure: Failed to connect to /192.168.3.212:50010 for file
> > > /hbase/data/default/image/45578853a5c807919578043c7715
> > >  1efa/cf/40c00d49a7494929a319467d82b383da for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294930505_255244337:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:33973, remote=/
> > > 192.168.3.212:50      010, for file
> > >
> > >
> >
> /hbase/data/default/image/45578853a5c807919578043c77151efa/cf/40c00d49a7494929a319467d82b383da,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294930505_255244337
> > > 2014-08-22 14:33:55,994 WARN
> > >  [defaultRpcServer.handler=3,queue=0,port=60020]
> hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:33:55,995 WARN
> > >  [defaultRpcServer.handler=3,queue=0,port=60020] hdfs.DFSClient:
> > Connection
> > > failure: Failed to connect to /192.168.3.52:50010 for file
> > > /hbase/data/default/image/1f07e0160fbba29a2c26104dd5966
> > >  39f/cf/7e36136497fc4a5a9889797a1dfb5d3b for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294915607_255226575:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:41017, remote=/
> > > 192.168.3.52:5001      0, for file
> > >
> > >
> >
> /hbase/data/default/image/1f07e0160fbba29a2c26104dd596639f/cf/7e36136497fc4a5a9889797a1dfb5d3b,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294915607_255226575
> > > 2014-08-22 14:44:14,301 WARN
> > >  [defaultRpcServer.handler=3,queue=0,port=60020]
> hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:44:14,302 WARN
> > >  [defaultRpcServer.handler=3,queue=0,port=60020] hdfs.DFSClient:
> > Connection
> > > failure: Failed to connect to /192.168.3.154:50010 for file
> > > /hbase/data/default/image/bb082a5f098d1d2a95365545349c
> > >  77a4/cf/1af06c6fd36045bab888b4ec18e45f0f for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294911152_255221397:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:44711, remote=/
> > > 192.168.3.154:50      010, for file
> > >
> > >
> >
> /hbase/data/default/image/bb082a5f098d1d2a95365545349c77a4/cf/1af06c6fd36045bab888b4ec18e45f0f,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294911152_255221397
> > > 77a4/cf/1af06c6fd36045bab888b4ec18e45f0f for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294911152_255221397:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:44711, remote=/
> > > 192.168.3.154:50      010, for file
> > >
> > >
> >
> /hbase/data/default/image/bb082a5f098d1d2a95365545349c77a4/cf/1af06c6fd36045bab888b4ec18e45f0f,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294911152_255221397
> > > 2014-08-22 14:49:14,780 WARN
> > >  [defaultRpcServer.handler=5,queue=2,port=60020] ipc.RpcServer:
> > > (responseTooSlow):
> > >
> > >
> >
> {"processingtimems":15494,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiReques
> > >      t)","client":"192.168.3.11:53884
> > >
> > >
> >
> ","starttimems":1408711739283,"queuetimems":0,"class":"HRegionServer","responsesize":483492,"method":"Multi"}
> > > 2014-08-22 14:50:37,900 WARN
> > >  [defaultRpcServer.handler=7,queue=1,port=60020]
> hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 14:50:37,901 WARN
> > >  [defaultRpcServer.handler=7,queue=1,port=60020] hdfs.DFSClient:
> > Connection
> > > failure: Failed to connect to /192.168.3.64:50010 for file
> > > /hbase/data/default/image/a50d67b81c44c864265f5030c7c39
> > >  959/cf/840ff51e591946c487413273d5341a24 for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1294309355_254555905:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:45043, remote=/
> > > 192.168.3.64:5001      0, for file
> > >
> > >
> >
> /hbase/data/default/image/a50d67b81c44c864265f5030c7c39959/cf/840ff51e591946c487413273d5341a24,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1294309355_254555905
> > > 2014-08-22 15:09:35,289 WARN
> > >  [defaultRpcServer.handler=23,queue=2,port=60020]
> > hdfs.BlockReaderFactory:
> > > I/O error constructing remote block reader.
> > > 2014-08-22 15:09:35,289 WARN
> > >  [defaultRpcServer.handler=23,queue=2,port=60020] hdfs.DFSClient:
> > > Connection failure: Failed to connect to /192.168.3.63:50010 for file
> > > /hbase/data/default/image/1db3051c8d41943892c0230cb75b
> > >  c1f2/cf/75ae96ea8ca2477e8f83291d7e1fe7cb for block
> > >
> > >
> >
> BP-1157637685-192.168.3.192-1382642140917:blk_1293205473_253317737:java.io.IOException:
> > > Got error for OP_READ_BLOCK, self=/192.168.3.179:56326, remote=/
> > > 192.168.3.63:500      10, for file
> > >
> > >
> >
> /hbase/data/default/image/1db3051c8d41943892c0230cb75bc1f2/cf/75ae96ea8ca2477e8f83291d7e1fe7cb,
> > > for pool BP-1157637685-192.168.3.192-1382642140917 block
> > > 1293205473_253317737
> > > 2014-08-22 15:18:38,891 WARN
> > >  [defaultRpcServer.handler=5,queue=2,port=60020] ipc.RpcServer:
> > > (responseTooSlow):
> > >
> > >
> >
> {"processingtimems":26092,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiReques
> > >      t)","client":"192.168.3.91:40189
> > >
> > >
> >
> ","starttimems":1408713492722,"queuetimems":0,"class":"HRegionServer","responsesize":483492,"method":"Multi"}
> > >
> > > What's interesting is that not all crashed handlers are later missing
> > after
> > > the freeze.
> > >
> > > Does that substantiate your conjecture?
> > >
> > >
> > I saw those in your log.  They are WARN from DFSClient which should be
> ok.
> >  Any other exceptions going on in your logs that could explain our
> handler
> > loss?  Or, can you see any thing particular about say the last mention of
> > defaultRpcServer.handler=6 say?
> >
> > St.Ack
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message