Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E97111D61 for ; Mon, 14 Jul 2014 10:06:57 +0000 (UTC) Received: (qmail 90599 invoked by uid 500); 14 Jul 2014 10:06:55 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 90532 invoked by uid 500); 14 Jul 2014 10:06:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 90518 invoked by uid 99); 14 Jul 2014 10:06:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jul 2014 10:06:55 +0000 X-ASF-Spam-Status: No, hits=3.1 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tianq01@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qa0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jul 2014 10:06:50 +0000 Received: by mail-qa0-f44.google.com with SMTP id f12so2981109qad.31 for ; Mon, 14 Jul 2014 03:06:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IKEKbr4MdTJ9U9l+d9lnlXAorhTm6kMIFHQEicT51LI=; b=CxozRh+S8uP+HkGx8ldbY1c/honuppgii3lxoTCEFuK0D/9dxJuKZWfW1nkFod8rz5 MkI6mF1ArjnoZ9RqnhqKAfsAQSAxGl9OTiyTvTRfOfLkCkVHO+IHpaHNF2ppdxc8jy61 2CU6s1JLdP61U78aF8XQXH++022Hx/BezrDUqbICcMm+/ChSFaRNUbRc4ca2zMoxX2bR eUWY/nuErJwYLYQeo6Y35PSiGBBF45JYDXgOQjWEIc/SVSY4ow/yJkOAzyiVTKtCF1J0 ilnA+01NMOfHYncIuPGZXpp2QneygKot4fmxdC8UO5gwZAwS7cz3PsErTL8mOx2+dUgw R6mQ== MIME-Version: 1.0 X-Received: by 10.140.39.164 with SMTP id v33mr21387226qgv.99.1405332389671; Mon, 14 Jul 2014 03:06:29 -0700 (PDT) Received: by 10.140.43.228 with HTTP; Mon, 14 Jul 2014 03:06:29 -0700 (PDT) In-Reply-To: <53C33165.6010400@gmail.com> References: <53C1FF5D.9060408@gmail.com> <1405282709523-4061293.post@n3.nabble.com> <53C33165.6010400@gmail.com> Date: Mon, 14 Jul 2014 18:06:29 +0800 Message-ID: Subject: Re: hbase region servers refuse connection From: Qiang Tian To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a11c14eaee0b7f304fe24715e X-Virus-Checked: Checked by ClamAV on apache.org --001a11c14eaee0b7f304fe24715e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi YuMing, :) yes. several iterations of jstack on the problem regionserver could help identify the problem Rural, you probably hit hbase11277(and probably YuMin as well) - the reader 14 loops again and again in below stack(high cpu usage) and listener 12 is blocked and cannot accept new connections. 1. Thread 12 (RpcServer.listener,port=3D60020): 2. State: BLOCKED 3. Blocked count: 123264191 4. Waited count: 0 5. Blocked on org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader@77f87716 6. Blocked by 14 (RpcServer.reader=3D1,port=3D60020) 7. Stack: 8. org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.registerChannel(Rp= cServer.java:598) 9. org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:7= 55) 10. org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:673) 11. Thread 24 (RpcServer.responder): 1. Thread 14 (RpcServer.reader=3D1,port=3D60020): 2. State: RUNNABLE 3. Blocked count: 12510492 4. Waited count: 12826560 5. Stack: 6. sun.nio.ch.FileDispatcher.read0(Native Method) 7. sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 8. sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251) 9. sun.nio.ch.IOUtil.read(IOUtil.java:224) 10. sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254) 11. org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2438) 12. org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2404) 13. org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServe= r.java:1498) 14. org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:780= ) 15. org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServe= r.java:568) 16. org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java= :543) 17. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav= a:1146) 18. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja= va:615) 19. java.lang.Thread.run(Thread.java:701) 20. Thread 13 (RpcServer.reader=3D0,port=3D60020): 21. 1. 2014-07-10 14:13:49,614 WARN [RpcServer.reader=3D7,port=3D60020] ipc.RpcServer: RpcServer.listener,port=3D60020: count of bytes read: 0 2. java.io.IOException: Connection reset by peer 3. at sun.nio.ch.FileDispatcher.read0(Native Method) 4. at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 5. at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251) 6. at sun.nio.ch.IOUtil.read(IOUtil.java:224) 7. at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254) 8. at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2404) 9. at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServe= r.java:1425) 10. at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:780= ) 11. at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServe= r.java:568) 12. at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java= :543) 13. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav= a:1146) 14. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja= va:615) 15. at java.lang.Thread.run(Thread.java:701) On Mon, Jul 14, 2014 at 9:24 AM, Rural Hunter wrote= : > Yes. But you may want to check if there are many connections in SYN_RECV > state when the problem happens. > > > =E4=BA=8E 2014/7/14 4:18, vito =E5=86=99=E9=81=93: > >> Hi Rural , >> >> >> Do you mean the following action you have taken? Thanks a lot. >> >> "Anyway, I just changed these kernel settings: >> net.core.somaxconn=3D1024 (original 128) >> net.ipv4.tcp_synack_retries=3D2 (original 5) " >> >> >> >> -- >> View this message in context: http://apache-hbase.679495.n3. >> nabble.com/hbase-region-servers-refuse-connection-tp4061278p4061293.html >> Sent from the HBase User mailing list archive at Nabble.com. >> . >> >> > --001a11c14eaee0b7f304fe24715e--