Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E3ED511D2F for ; Sat, 23 Aug 2014 19:12:11 +0000 (UTC) Received: (qmail 95293 invoked by uid 500); 23 Aug 2014 19:12:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 95222 invoked by uid 500); 23 Aug 2014 19:12:04 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 95207 invoked by uid 99); 23 Aug 2014 19:12:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Aug 2014 19:12:04 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.54] (HELO mail-pa0-f54.google.com) (209.85.220.54) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Aug 2014 19:12:00 +0000 Received: by mail-pa0-f54.google.com with SMTP id fa1so18702056pad.27 for ; Sat, 23 Aug 2014 12:11:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=GRfPM2ReRLdmfOITEIy9ryE4fbgl3jlqfhOqUoqz3n4=; b=BAzUfRsOtsPTrRnbpbAgfkG3J0LhBxHLdxJCKMuBawAWeX8ZL5z8hSM+vtaqYAxAKy qlyupmPhrd3D1wg9N42Fg3VXCVMSJv0GK9FlTv4gIzEWer30OyrOtsxn6LmYOecF0AjA E5CgpyFtK2E23NCN5ddVSH+EcszE5wVOLDnPiP1qTlp54pGUxxshXzpcBdECFUE8c6gD 953m1iersY7uWTuNc+A0cEl3WvORECVN1H7sdr+gqWn1DoMCbi1Bzbm+Sy7hTQ30TPie esb/efPT9ankcqD6/DJ4uVfJZiyVH4QIAwNAjuyU+SAfUSPhSWbzgPzBug2Yu1hcTuYr ISOA== X-Gm-Message-State: ALoCoQmOsD1F7eBD5bB0chd7TQjMZumkHyZ7IHcEi9BI6nc+x0eT0CIv1jaw9Xr8Y8BrjK8y+i7s MIME-Version: 1.0 X-Received: by 10.66.120.176 with SMTP id ld16mr15882523pab.84.1408821098314; Sat, 23 Aug 2014 12:11:38 -0700 (PDT) Received: by 10.70.37.204 with HTTP; Sat, 23 Aug 2014 12:11:38 -0700 (PDT) In-Reply-To: References: Date: Sat, 23 Aug 2014 21:11:38 +0200 Message-ID: Subject: Re: single RegionServer stuck, causing cluster to hang From: Johannes Schaback To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7b07243c1e187c050150b910 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b07243c1e187c050150b910 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, we had to reduce load on the cluster yesterday night which reduced the frequency of the phenomenon. That is why I could not get a jstack dump yet because it did not occur since a couple hours. We will now get the load back up hoping to trigger it again. Yes, I cut out the properties from the /debug dump because they are all standard. We have hbase.ipc.server.callqueue.handler.factor to the standard hbase.ipc.server.callqueue.handler.factor 0.1 hbase-default.xml You find the complete config of the RS here: http://pastebin.com/iF1ibFb1 The hint about the .out files was a great one (I never really looked at them actually). The .out files are flooded with StackOverflowExceptions: Exception in thread "defaultRpcServer.handler=3D5,queue=3D2,port=3D60020" java.lang.StackOverflowError at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210) at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210) at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210) at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210) (and so on...) Filtering the .out file for "Exception" shows that several handlers crashed like that: Exception in thread "defaultRpcServer.handler=3D5,queue=3D2,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D18,queue=3D0,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D23,queue=3D2,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D24,queue=3D0,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D2,queue=3D2,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D11,queue=3D2,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D25,queue=3D1,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D20,queue=3D2,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D19,queue=3D1,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D15,queue=3D0,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D1,queue=3D1,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D7,queue=3D1,port=3D60020" java.lang.StackOverflowError Exception in thread "defaultRpcServer.handler=3D4,queue=3D1,port=3D60020" java.lang.StackOverflowError Unfortunately, the exceptions are not timestamped so that I can not correlate their occurrence with the exact time when the RS starts filling up the queue. On Sat, Aug 23, 2014 at 8:28 PM, Stack wrote: > Anything in your .out that could help explain our losing handlers if you > can't find anything in the logs? > > You did the 'snipp' in the below, right Johannes? > > RS Configuration: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > [snipp] no fancy stuff, all default, except absolute necessary settings > [snipp] > > > As per Qian, if hbase.ipc.server.callqueue.handler.factor, that could hel= p > explain why we have handlers but they are not 'taking' from the call queu= e, > they are stuck taking on those queues that do not have calls queued. > > St.Ack > > > > > > > > On Sat, Aug 23, 2014 at 2:56 AM, Qiang Tian wrote: > > > Did you set hbase.ipc.server.callqueue.handler.factor? > > it looks there are 3 queues, handlers on queue 1 are all gone as Stack > > mentioned. jstack and pastebin regions server log would help. > > > > > > > > > > > > On Sat, Aug 23, 2014 at 7:02 AM, Stack wrote: > > > > > On Fri, Aug 22, 2014 at 3:24 PM, Johannes Schaback < > > > johannes.schaback@visual-meta.com> wrote: > > > > > > > ... > > > > I grep'ed "defaultRpcServer.handler=3D" on the log from that partic= ular > > RS. > > > > The > > > > RS started at 15:35. After that, the handlers > > > > > > > > 6, 24, 0, 15, 28, 26, 7, 19, 21, 3, 5 and 23 > > > > > > > > make an appearance in error messages of HDFS related exceptions: > > > > > > > > 2014-08-22 13:54:57,470 WARN > > > > [defaultRpcServer.handler=3D6,queue=3D0,port=3D60020] > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 13:54:57,472 WARN > > > > [defaultRpcServer.handler=3D6,queue=3D0,port=3D60020] hdfs.DFSClie= nt: > > > Connection > > > > failure: Failed to connect to /192.168.3.233:50010 for file > > > > /hbase/data/default/image/dae1b8e3bfdb608571d09916bf0f > > > > a156/cf/866a773857654b0d83275dc4e4558be6 for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294949058_255264636:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:37835, remote=3D= / > > > > 192.168.3.233:50 010, for file > > > > > > > > > > > > > > /hbase/data/default/image/dae1b8e3bfdb608571d09916bf0fa156/cf/866a7738576= 54b0d83275dc4e4558be6, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294949058_255264636 > > > > 2014-08-22 13:56:58,525 WARN > > > > [defaultRpcServer.handler=3D24,queue=3D0,port=3D60020] > > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 13:56:58,799 WARN > > > > [defaultRpcServer.handler=3D24,queue=3D0,port=3D60020] hdfs.DFSCli= ent: > > > > Connection failure: Failed to connect to /192.168.3.142:50010 for > file > > > > /hbase/data/default/image/cf493cbd9921ae6ca5e5281cc07 > > > > 18ca2/cf/7f9618dcdeae40ddbff21165d08e0a83 for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294072754_254292863:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:59268, remote=3D= / > > > > 192.168.3.142:5 0010, for file > > > > > > > > > > > > > > /hbase/data/default/image/cf493cbd9921ae6ca5e5281cc0718ca2/cf/7f9618dcdea= e40ddbff21165d08e0a83, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294072754_254292863 > > > > 2014-08-22 14:08:01,632 WARN > > > > [defaultRpcServer.handler=3D0,queue=3D0,port=3D60020] > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:08:01,633 WARN > > > > [defaultRpcServer.handler=3D0,queue=3D0,port=3D60020] hdfs.DFSClie= nt: > > > Connection > > > > failure: Failed to connect to /192.168.3.53:50010 for file > > > > /hbase/data/default/image/af02bc7fb404f4c054dcd64b44b0e > > > > 2a9/cf/a8881b6e2d5d41b3b56fd34fd4ca8ffd for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294876123_255182439:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:60737, remote=3D= / > > > > 192.168.3.53:5001 0, for file > > > > > > > > > > > > > > /hbase/data/default/image/af02bc7fb404f4c054dcd64b44b0e2a9/cf/a8881b6e2d5= d41b3b56fd34fd4ca8ffd, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294876123_255182439 > > > > 2014-08-22 14:11:34,192 WARN > > > > [defaultRpcServer.handler=3D15,queue=3D0,port=3D60020] ipc.RpcServ= er: > > > > (responseTooSlow): > > > > > > > > > > > > > > {"processingtimems":15432,"call":"Multi(org.apache.hadoop.hbase.protobuf.= generated.ClientProtos$MultiReque > > > > st)","client":"192.168.3.54:52838 > > > > > > > > > > > > > > ","starttimems":1408709478713,"queuetimems":0,"class":"HRegionServer","re= sponsesize":99638,"method":"Multi"} > > > > 2014-08-22 14:22:21,847 WARN > > > > [defaultRpcServer.handler=3D28,queue=3D1,port=3D60020] > > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:22:21,848 WARN > > > > [defaultRpcServer.handler=3D28,queue=3D1,port=3D60020] hdfs.DFSCli= ent: > > > > Connection failure: Failed to connect to /192.168.3.233:50010 for > file > > > > /hbase/data/default/image/1d33a251862055cd999078bbd10 > > > > aa44c/cf/a7470351725f4cc192b6210bac9b7c44 for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294949316_255264902:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:44787, remote=3D= / > > > > 192.168.3.233:5 0010, for file > > > > > > > > > > > > > > /hbase/data/default/image/1d33a251862055cd999078bbd10aa44c/cf/a7470351725= f4cc192b6210bac9b7c44, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294949316_255264902 > > > > 2014-08-22 14:23:22,628 WARN > > > > [defaultRpcServer.handler=3D26,queue=3D2,port=3D60020] > > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:23:22,628 WARN > > > > [defaultRpcServer.handler=3D26,queue=3D2,port=3D60020] hdfs.DFSCli= ent: > > > > Connection failure: Failed to connect to /192.168.3.142:50010 for > file > > > > /hbase/data/default/image/4a9c830a4fe006f0a6af7418164 > > > > dd86d/cf/47d6f2c3f7054d40aed66cae9787c464 for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1293459594_253612452:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:37660, remote=3D= / > > > > 192.168.3.142:5 0010, for file > > > > > > > > > > > > > > /hbase/data/default/image/4a9c830a4fe006f0a6af7418164dd86d/cf/47d6f2c3f70= 54d40aed66cae9787c464, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1293459594_253612452 > > > > 2014-08-22 14:25:35,003 WARN > > > > [defaultRpcServer.handler=3D7,queue=3D1,port=3D60020] > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:25:35,004 WARN > > > > [defaultRpcServer.handler=3D7,queue=3D1,port=3D60020] hdfs.DFSClie= nt: > > > Connection > > > > failure: Failed to connect to /192.168.3.51:50010 for file > > > > /hbase/data/default/run_automaton_cache/da03f8123004be3 > > > > 2659e1c8a51afbbf8/cf/1daf234e740f4b00889bb60e574dc79b for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294614349_254896345:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:53627, remote=3D= /192 > > > > .168.3.51:50010, for file > > > > > > > > > > > > > > /hbase/data/default/run_automaton_cache/da03f8123004be32659e1c8a51afbbf8/= cf/1daf234e740f4b00889bb60e574dc79b, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294614349_254896345 > > > > 2014-08-22 14:25:46,831 WARN > > > > [defaultRpcServer.handler=3D19,queue=3D1,port=3D60020] > > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:25:46,832 WARN > > > > [defaultRpcServer.handler=3D19,queue=3D1,port=3D60020] hdfs.DFSCli= ent: > > > > Connection failure: Failed to connect to /192.168.3.30:50010 for > file > > > > /hbase/data/default/image/1c3bab43e260ddb46a06cf04e293 > > > > e386/cf/740f8240ac9a4abc9e5fcf6ec7df18bc for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294151564_254380545:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:55660, remote=3D= / > > > > 192.168.3.30:500 10, for file > > > > > > > > > > > > > > /hbase/data/default/image/1c3bab43e260ddb46a06cf04e293e386/cf/740f8240ac9= a4abc9e5fcf6ec7df18bc, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294151564_254380545 > > > > 2014-08-22 14:28:22,395 WARN > > > > [defaultRpcServer.handler=3D26,queue=3D2,port=3D60020] > > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:28:22,397 WARN > > > > [defaultRpcServer.handler=3D26,queue=3D2,port=3D60020] hdfs.DFSCli= ent: > > > > Connection failure: Failed to connect to /192.168.3.143:50010 for > file > > > > /hbase/data/default/image/b4b48e20a606c431e393b674f92 > > > > 79daf/cf/76ff3628306a48f187a285b2a21d9ac9 for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294308080_254554506:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:51759, remote=3D= / > > > > 192.168.3.143:5 0010, for file > > > > > > > > > > > > > > /hbase/data/default/image/b4b48e20a606c431e393b674f9279daf/cf/76ff3628306= a48f187a285b2a21d9ac9, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294308080_254554506 > > > > 2014-08-22 14:30:29,395 WARN > > > > [defaultRpcServer.handler=3D21,queue=3D0,port=3D60020] > > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:30:29,396 WARN > > > > [defaultRpcServer.handler=3D21,queue=3D0,port=3D60020] hdfs.DFSCli= ent: > > > > Connection failure: Failed to connect to /192.168.3.30:50010 for > file > > > > /hbase/data/default/image/77ceb80d73c065f1a4ba2ad3b7cf > > > > 04a2/cf/151c244177ac42b5996af0c9052660cc for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294365296_254618150:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:57238, remote=3D= / > > > > 192.168.3.30:500 10, for file > > > > > > > > > > > > > > /hbase/data/default/image/77ceb80d73c065f1a4ba2ad3b7cf04a2/cf/151c244177a= c42b5996af0c9052660cc, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294365296_254618150 > > > > 2014-08-22 14:31:34,016 WARN > > > > [defaultRpcServer.handler=3D6,queue=3D0,port=3D60020] > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:31:34,018 WARN > > > > [defaultRpcServer.handler=3D6,queue=3D0,port=3D60020] hdfs.DFSClie= nt: > > > Connection > > > > failure: Failed to connect to /192.168.3.212:50010 for file > > > > /hbase/data/default/image/45578853a5c807919578043c7715 > > > > 1efa/cf/40c00d49a7494929a319467d82b383da for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294930505_255244337:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:33973, remote=3D= / > > > > 192.168.3.212:50 010, for file > > > > > > > > > > > > > > /hbase/data/default/image/45578853a5c807919578043c77151efa/cf/40c00d49a74= 94929a319467d82b383da, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294930505_255244337 > > > > 2014-08-22 14:33:55,994 WARN > > > > [defaultRpcServer.handler=3D3,queue=3D0,port=3D60020] > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:33:55,995 WARN > > > > [defaultRpcServer.handler=3D3,queue=3D0,port=3D60020] hdfs.DFSClie= nt: > > > Connection > > > > failure: Failed to connect to /192.168.3.52:50010 for file > > > > /hbase/data/default/image/1f07e0160fbba29a2c26104dd5966 > > > > 39f/cf/7e36136497fc4a5a9889797a1dfb5d3b for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294915607_255226575:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:41017, remote=3D= / > > > > 192.168.3.52:5001 0, for file > > > > > > > > > > > > > > /hbase/data/default/image/1f07e0160fbba29a2c26104dd596639f/cf/7e36136497f= c4a5a9889797a1dfb5d3b, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294915607_255226575 > > > > 2014-08-22 14:44:14,301 WARN > > > > [defaultRpcServer.handler=3D3,queue=3D0,port=3D60020] > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:44:14,302 WARN > > > > [defaultRpcServer.handler=3D3,queue=3D0,port=3D60020] hdfs.DFSClie= nt: > > > Connection > > > > failure: Failed to connect to /192.168.3.154:50010 for file > > > > /hbase/data/default/image/bb082a5f098d1d2a95365545349c > > > > 77a4/cf/1af06c6fd36045bab888b4ec18e45f0f for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294911152_255221397:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:44711, remote=3D= / > > > > 192.168.3.154:50 010, for file > > > > > > > > > > > > > > /hbase/data/default/image/bb082a5f098d1d2a95365545349c77a4/cf/1af06c6fd36= 045bab888b4ec18e45f0f, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294911152_255221397 > > > > 77a4/cf/1af06c6fd36045bab888b4ec18e45f0f for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294911152_255221397:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:44711, remote=3D= / > > > > 192.168.3.154:50 010, for file > > > > > > > > > > > > > > /hbase/data/default/image/bb082a5f098d1d2a95365545349c77a4/cf/1af06c6fd36= 045bab888b4ec18e45f0f, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294911152_255221397 > > > > 2014-08-22 14:49:14,780 WARN > > > > [defaultRpcServer.handler=3D5,queue=3D2,port=3D60020] ipc.RpcServe= r: > > > > (responseTooSlow): > > > > > > > > > > > > > > {"processingtimems":15494,"call":"Multi(org.apache.hadoop.hbase.protobuf.= generated.ClientProtos$MultiReques > > > > t)","client":"192.168.3.11:53884 > > > > > > > > > > > > > > ","starttimems":1408711739283,"queuetimems":0,"class":"HRegionServer","re= sponsesize":483492,"method":"Multi"} > > > > 2014-08-22 14:50:37,900 WARN > > > > [defaultRpcServer.handler=3D7,queue=3D1,port=3D60020] > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 14:50:37,901 WARN > > > > [defaultRpcServer.handler=3D7,queue=3D1,port=3D60020] hdfs.DFSClie= nt: > > > Connection > > > > failure: Failed to connect to /192.168.3.64:50010 for file > > > > /hbase/data/default/image/a50d67b81c44c864265f5030c7c39 > > > > 959/cf/840ff51e591946c487413273d5341a24 for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294309355_254555905:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:45043, remote=3D= / > > > > 192.168.3.64:5001 0, for file > > > > > > > > > > > > > > /hbase/data/default/image/a50d67b81c44c864265f5030c7c39959/cf/840ff51e591= 946c487413273d5341a24, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1294309355_254555905 > > > > 2014-08-22 15:09:35,289 WARN > > > > [defaultRpcServer.handler=3D23,queue=3D2,port=3D60020] > > > hdfs.BlockReaderFactory: > > > > I/O error constructing remote block reader. > > > > 2014-08-22 15:09:35,289 WARN > > > > [defaultRpcServer.handler=3D23,queue=3D2,port=3D60020] hdfs.DFSCli= ent: > > > > Connection failure: Failed to connect to /192.168.3.63:50010 for > file > > > > /hbase/data/default/image/1db3051c8d41943892c0230cb75b > > > > c1f2/cf/75ae96ea8ca2477e8f83291d7e1fe7cb for block > > > > > > > > > > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1293205473_253317737:java.i= o.IOException: > > > > Got error for OP_READ_BLOCK, self=3D/192.168.3.179:56326, remote=3D= / > > > > 192.168.3.63:500 10, for file > > > > > > > > > > > > > > /hbase/data/default/image/1db3051c8d41943892c0230cb75bc1f2/cf/75ae96ea8ca= 2477e8f83291d7e1fe7cb, > > > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > > > 1293205473_253317737 > > > > 2014-08-22 15:18:38,891 WARN > > > > [defaultRpcServer.handler=3D5,queue=3D2,port=3D60020] ipc.RpcServe= r: > > > > (responseTooSlow): > > > > > > > > > > > > > > {"processingtimems":26092,"call":"Multi(org.apache.hadoop.hbase.protobuf.= generated.ClientProtos$MultiReques > > > > t)","client":"192.168.3.91:40189 > > > > > > > > > > > > > > ","starttimems":1408713492722,"queuetimems":0,"class":"HRegionServer","re= sponsesize":483492,"method":"Multi"} > > > > > > > > What's interesting is that not all crashed handlers are later missi= ng > > > after > > > > the freeze. > > > > > > > > Does that substantiate your conjecture? > > > > > > > > > > > I saw those in your log. They are WARN from DFSClient which should b= e > > ok. > > > Any other exceptions going on in your logs that could explain our > > handler > > > loss? Or, can you see any thing particular about say the last mentio= n > of > > > defaultRpcServer.handler=3D6 say? > > > > > > St.Ack > > > > > > --=20 LadenZeile.de powered by Visual Meta GmbH - www.visual-meta.com Tel.: +49 30 / 609 84 88 20 Fax: +49 30 / 609 84 88 21 E-Mail: johannes.schaback@visual-meta.com Visual Meta GmbH, Sch=C3=BCtzenstra=C3=9Fe 25, 10117 Berlin Gesch=C3=A4ftsf=C3=BChrer: Robert M. Maier, Johannes Schaback Handelsregister HRB 115795 B, Amtsgericht Charlottenburg USt-IdNr.: DE263760203 --047d7b07243c1e187c050150b910--