Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DE4191196B for ; Tue, 10 Jun 2014 17:43:16 +0000 (UTC) Received: (qmail 68662 invoked by uid 500); 10 Jun 2014 17:43:12 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 68533 invoked by uid 500); 10 Jun 2014 17:43:12 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 68526 invoked by uid 99); 10 Jun 2014 17:43:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jun 2014 17:43:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anfernee.xu@gmail.com designates 74.125.82.177 as permitted sender) Received: from [74.125.82.177] (HELO mail-we0-f177.google.com) (74.125.82.177) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jun 2014 17:43:06 +0000 Received: by mail-we0-f177.google.com with SMTP id u56so5828807wes.36 for ; Tue, 10 Jun 2014 10:42:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=cyvJ3NsnQviijXfTaK2noFH0WiG/3qyOAPAgVLImTQY=; b=N3l9sMptBoLgm+Rx5X0b3gir4KRnX4nWTD0ap9j8o2eJrB1+m1lQM7ld2z63Aco4ji Q5eVDtz5i7q6Hd3fr/FAPqE5Plpys3KCwSTHi0T8a85JEHaAtwB0TgX5StZAlnqZ9eOE Dpl2M2GfM6XnTYlTHlhm1otR19ZUUtiQEMoPAorQCpAWeJNsxcy5pS0ZsC4/a0vZuRl0 pK/ufrJROyI0GZubPvbvl9ZKcnJOsz+MkhLfUm2yNpCtGm626hu5NB+WFRmj5o80wjLw rMzXCR9FT28F1faKBHB5TsV2OiHV7ve1KQ6KSr2ZJrVQclQhj6DDArrIgzeOvIn/Hx+9 9Q4w== MIME-Version: 1.0 X-Received: by 10.180.98.163 with SMTP id ej3mr43047908wib.9.1402422165680; Tue, 10 Jun 2014 10:42:45 -0700 (PDT) Received: by 10.216.45.76 with HTTP; Tue, 10 Jun 2014 10:42:45 -0700 (PDT) Date: Tue, 10 Jun 2014 10:42:45 -0700 Message-ID: Subject: Writable RPC had a lot of leftover TCP connections in CLOSE_WAIT after RPC_TIMEOUT is enabled From: Anfernee Xu To: user Content-Type: multipart/alternative; boundary=f46d0442877a02babf04fb7edbbe X-Virus-Checked: Checked by ClamAV on apache.org --f46d0442877a02babf04fb7edbbe Content-Type: text/plain; charset=UTF-8 Hi, I'm using hadoop-2.2.0 and take advantage of Hadoop WritableRpcEngine to build my distributed application, and I have 'heartbeat' interface in my application to check availability periodically, in order to detect any potential failure, I enabled "rpc_timeout" when creating the proxy as below int rpcTimeout=1000;// 1 second as rpc timeout RPC.waitForProxy( MyApplicationInterface.class, MyApplicationInterface.versionID, socAddr, conf, rpcTimeout, timeout); Everything went fine initially, I can see failures can be detected by the heartbeat, but after a period of time(2 days or so), I saw a lot of TCP connections in CLOSE_WAIT state on server side, and client was not able to connect to it again. Any clue about this? Thanks -- --Anfernee --f46d0442877a02babf04fb7edbbe Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi,

I'm using hadoop-2.2.0 and take= advantage of Hadoop=C2=A0WritableRpcEngine to build my distributed applica= tion, and I have 'heartbeat' interface in my application to check a= vailability periodically, in order to detect any potential failure, I enabl= ed "rpc_timeout" when creating the proxy as below

=C2=A0int rpcTimeout=3D1000;// 1 second as rpc timeout<= /div>

=C2=A0RPC.waitForProxy(

=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 MyApplicationInterface.class,=C2=A0MyApplicationInterf= ace.versionID,

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 socAddr, conf, = rpcTimeout, timeout);=C2=A0

Everything went fine initially, I can see failures can = be detected by the heartbeat, but after a period of time(2 days or so), I s= aw a lot of TCP connections in CLOSE_WAIT state on server side, and client = was not able to connect to it again.

Any clue about this?

Thanks

--
--Anfernee

--f46d0442877a02babf04fb7edbbe--