Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5239510677 for ; Mon, 17 Feb 2014 08:04:34 +0000 (UTC) Received: (qmail 70590 invoked by uid 500); 17 Feb 2014 08:04:28 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 70146 invoked by uid 500); 17 Feb 2014 08:04:27 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 70138 invoked by uid 99); 17 Feb 2014 08:04:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Feb 2014 08:04:26 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kiran.chitturi@lucidworks.com designates 207.46.163.243 as permitted sender) Received: from [207.46.163.243] (HELO na01-by2-obe.outbound.protection.outlook.com) (207.46.163.243) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Feb 2014 08:04:20 +0000 Received: from BY2PR06MB568.namprd06.prod.outlook.com (10.141.221.27) by BY2PR06MB567.namprd06.prod.outlook.com (10.141.221.23) with Microsoft SMTP Server (TLS) id 15.0.878.16; Mon, 17 Feb 2014 08:03:56 +0000 Received: from BY2PR06MB568.namprd06.prod.outlook.com ([10.141.221.27]) by BY2PR06MB568.namprd06.prod.outlook.com ([10.141.221.27]) with mapi id 15.00.0878.008; Mon, 17 Feb 2014 08:03:56 +0000 From: Kiran Chitturi To: "solr-user@lucene.apache.org" Subject: Re: SolrJ Socket Leak Thread-Topic: SolrJ Socket Leak Thread-Index: AQHPKPurhWMWVt5q7kW2FBb8ojWBc5q4lYYA Date: Mon, 17 Feb 2014 08:03:55 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [24.5.176.10] x-forefront-prvs: 012570D5A0 x-forefront-antispam-report: SFV:NSPM;SFS:(10019001)(6009001)(189002)(199002)(51704005)(479174003)(377454003)(24454002)(80976001)(83322001)(19580405001)(19580395003)(76796001)(76786001)(47736001)(49866001)(4396001)(74366001)(77982001)(95666001)(86362001)(94316002)(74502001)(93136001)(81816001)(59766001)(94946001)(74662001)(92566001)(31966008)(15975445006)(79102001)(36756003)(81686001)(56776001)(47446002)(92726001)(93516002)(95416001)(90146001)(46102001)(66066001)(74706001)(65816001)(74876001)(83072002)(63696002)(85852003)(87266001)(81342001)(50986001)(47976001)(2656002)(80022001)(56816005)(87936001)(54356001)(76482001)(53806001)(81542001)(54316002)(69226001)(85306002)(51856001);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR06MB567;H:BY2PR06MB568.namprd06.prod.outlook.com;CLIP:24.5.176.10;FPR:E85DF1AD.9C0261CB.70F11DA3.42E9E14D.204B5;PTR:InfoNoRecords;MX:1;A:1;LANG:en; Content-Type: text/plain; charset="us-ascii" Content-ID: <8E5CCBC088F4294DBD752EA83F60BF1B@namprd06.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: lucidworks.com X-Virus-Checked: Checked by ClamAV on apache.org Jared, I faced a similar issue when using CloudSolrServer with Solr. As Shawn pointed out the 'TIME_WAIT' status happens when the connection is closed by the http client. HTTP client closes connection whenever it thinks the connection is stale (https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html #d5e405). Even the docs point out the stale connection checking cannot be all reliable.=20 I see two ways to get around this: 1. Enable 'SO_REUSEADDR' 2. Disable stale connection checks. Also by default, when we create CSS it does not explicitly configure any http client parameters (https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/a pache/solr/client/solrj/impl/CloudSolrServer.java#L124). In this case, the default configuration parameters (max connections, max connections per host) are used for a http connection. You can explicitly configure these params when creating CSS using HttpClientUtil: ModifiableSolrParams params =3D new ModifiableSolrParams(); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128); params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32); params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false); params.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, 30000); httpClient =3D HttpClientUtil.createClient(params); final HttpClient client =3D HttpClientUtil.createClient(params); LBHttpSolrServer lb =3D new LBHttpSolrServer(client); CloudSolrServer server =3D new CloudSolrServer(zkConnect, lb); Currently, I am using http client 4.3.2 and building the client when creating the CSS. I also use 'SO_REUSEADDR' option and I haven't seen the 'TIME_WAIT' after this (may be because of better handling of stale connections in 4.3.2 or because of 'SO_REUSEADDR' param enabled). My current http client code looks like this: (works only with http client 4.3.2) HttpClientBuilder httpBuilder =3D HttpClientBuilder.create(); =20 Builder socketConfig =3D SocketConfig.custom(); socketConfig.setSoReuseAddress(true); socketConfig.setSoTimeout(10000); httpBuilder.setDefaultSocketConfig(socketConfig.build()); httpBuilder.setMaxConnTotal(300); httpBuilder.setMaxConnPerRoute(100); =20 httpBuilder.disableRedirectHandling(); httpBuilder.useSystemProperties(); LBHttpSolrServer lb =3D new LBHttpSolrServer(httpClient, parser) CloudSolrServer server =3D new CloudSolrServer(zkConnect, lb); There should be a way to configure socket reuse with 4.2.3 too. You can try different configurations. I am surprised you have 'TIME_WAIT' connections even after 30 minutes because 'TIME_WAIT' connection should be closed by default in 2 mins by O.S I think. HTH, --=20 Kiran Chitturi, On 2/13/14 12:38 PM, "Jared Rodriguez" wrote: >I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part >of a web application which connects to the solr server via solrj >using CloudSolrServer(); The web application is wired up with Guice, and >there is a single instance of the CloudSolrServer class used by all >inbound >requests. All this is running on Amazon. > >Basically, everything looks and runs fine for a while, but even with >moderate concurrency, solrj starts leaving sockets open. We are handling >only about 250 connections to the web app per minute and each of these >issues from 3 - 7 requests to solr. Over a 30 minute period of this type >of use, we end up with many 1000s of lingering sockets. I can see these >when running netstats > >tcp 0 0 ip-10-80-14-26.ec2.in:41098 >ip-10-99-145-47.ec2.i:glrpc >TIME_WAIT > >All to the same target host, which is my solr server. There are no other >pieces of infrastructure on that box, just solr. Eventually, the server >just dies as no further sockets can be opened and the opened ones are not >reused. > >The solr server itself is unphased and running like a champ. Average >timer >per request of 0.126, as seen in the solr web app admin UI query handler >stats. > >Apache httpclient had a bunch of leakage from version 4.2.x that they >cleaned up and refactored in 4.3.x, which is why I upgraded. Currently, >solrj makes use of the old leaky 4.2 classes for establishing connections >and using a connection pool. > >http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.3.x.t >xt > > > >--=20 >Jared Rodriguez