Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6E47A18AB1 for ; Fri, 4 Dec 2015 21:12:11 +0000 (UTC) Received: (qmail 5575 invoked by uid 500); 4 Dec 2015 21:12:11 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 5526 invoked by uid 500); 4 Dec 2015 21:12:11 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 5515 invoked by uid 99); 4 Dec 2015 21:12:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Dec 2015 21:12:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id EDF902C1F6C for ; Fri, 4 Dec 2015 21:12:10 +0000 (UTC) Date: Fri, 4 Dec 2015 21:12:10 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-14926) Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14926: -------------------------- Attachment: 14926v2.txt Ok. Played more trying to capture thriftserver in described state. I saw the issue in thread dump but proved elusive (wasn't sure if it my timeout that fixed it or not). The change is pretty basic so should be safe enough... setting a timeout on server socket... which we want anyways. Going to commit on back of @apurtell +1. v2 adds logging of the timeout to thrift2 version too and adds a sentence or two more to examples on how to get going with thrift server. > Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading > ------------------------------------------------------------------------------------------------------ > > Key: HBASE-14926 > URL: https://issues.apache.org/jira/browse/HBASE-14926 > Project: HBase > Issue Type: Bug > Components: Thrift > Affects Versions: 2.0.0, 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16 > Reporter: stack > Assignee: stack > Attachments: 14926.patch, 14926v2.txt > > > Thrift server is hung. All worker threads are doing this: > {code} > "thrift-worker-0" daemon prio=10 tid=0x00007f0bb95c2800 nid=0xf6a7 runnable [0x00007f0b956e0000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - locked <0x000000066d859490> (a java.io.BufferedInputStream) > at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601) > at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) > at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289) > at org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > They never recover. > I don't have client side logs. > We've been here before: HBASE-4967 "connected client thrift sockets should have a server side read timeout" but this patch only got applied to fb branch (and thrift has changed since then). -- This message was sent by Atlassian JIRA (v6.3.4#6332)