Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B7653FB4E for ; Fri, 29 Mar 2013 19:44:41 +0000 (UTC) Received: (qmail 3127 invoked by uid 500); 29 Mar 2013 19:44:39 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 3052 invoked by uid 500); 29 Mar 2013 19:44:39 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 3044 invoked by uid 99); 29 Mar 2013 19:44:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2013 19:44:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jeffw@qualtrics.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Mar 2013 19:44:35 +0000 Received: by mail-ie0-f180.google.com with SMTP id a11so881084iee.11 for ; Fri, 29 Mar 2013 12:44:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=yTgPMgUAjz7Shp/tHn+4+7M+0/8GIDQrAcxzgvt/jCs=; b=gji0+s6MxuPCeiA/JRPgUzC+CakegVxutkDooyD7xN0V6YvIti3Y2PTd9lV4iRb/zY ngTkDFhYVjltvTajo/IF8+pONAbbUkU0iw/UcSGYbWWI6wY+tp7acE43coOcKYHAmbQL epXbXwnzqxRmfJVf3CWfVEfZp1MAJOgMWRBOFCXktdRuH33MJIuTTQNY3+ZUPTdGMdF6 ExAiTKGFMqle2LERFS8kFxN2hkZFYgSff6gJWTlQzHdCAru2mPpd2Y/SSrC9WYEtMOOg CQ+6lGM/OZSEEhwgBcRSineDpU7X4oFmV2wtxBGUrciMzdrHA9cegqs98OByp+HkGHMi SpUQ== MIME-Version: 1.0 X-Received: by 10.50.173.102 with SMTP id bj6mr474497igc.16.1364586254014; Fri, 29 Mar 2013 12:44:14 -0700 (PDT) Received: by 10.64.35.41 with HTTP; Fri, 29 Mar 2013 12:44:13 -0700 (PDT) In-Reply-To: References: Date: Fri, 29 Mar 2013 13:44:13 -0600 Message-ID: Subject: Re: HBaseClient isn't reusing connections but creating a new one each time From: Jeff Whiting To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8f83a6afefa82e04d9157e93 X-Gm-Message-State: ALoCoQkX0D11MtDnvOfBOy/ktMS7OGATutzypCnFfI4TWdzfb2Wrvc2v13CfxjLhGDAej/FC7geq X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f83a6afefa82e04d9157e93 Content-Type: text/plain; charset=ISO-8859-1 I am using cdh4.1.3 which roughly maps to 0.92.1 with patches. ~Jeff On Fri, Mar 29, 2013 at 1:40 PM, ramkrishna vasudevan < ramkrishna.s.vasudevan@gmail.com> wrote: > Nice one.. Good find. > > > On Sat, Mar 30, 2013 at 12:30 AM, Ted Yu wrote: > > > Can you tell us the version of HBase you are using ? > > > > Gary did some cleanup in: > > > > r1439723 | garyh | 2013-01-28 16:50:02 -0800 (Mon, 28 Jan 2013) | 1 line > > > > HBASE-7626 Backport client connection cleanup from HBASE-7460 > > > > This is the current code in getConnection() in 0.94 branch: > > ConnectionId remoteId = new ConnectionId(addr, protocol, ticket, > > rpcTimeout); > > synchronized (connections) { > > connection = connections.get(remoteId); > > if (connection == null) { > > connection = createConnection(remoteId); > > connections.put(remoteId, connection); > > } > > } > > connection.addCall(call); > > > > > > On Fri, Mar 29, 2013 at 11:41 AM, Jeff Whiting > > wrote: > > > > > After noticing a lot of threads, I turned on debugging logging for > hbase > > > client and saw this many times counting up constantly: > > > HBaseClient:531 - IPC Client (687163870) connection to > > > /10.1.37.21:60020from jeff: starting, having connections 1364 > > > > > > At that point in my code it was up to 1364 different connections (and > > > threads). Those connections will eventually drop off after the idle > time > > > is reached "conf.getInt("hbase.ipc.client.connection.maxidletime", > > 10000)". > > > But during periods of activity the number of threads can get very high. > > > > > > Additionally I was able to confirm the large number of threads by > doing: > > > > > > jstack | grep IPC > > > > > > > > > So I started digging around in the code... > > > > > > In HBaseClient.getConnection it attempts to reuse previous connections: > > > > > > ConnectionId remoteId = new ConnectionId(addr, protocol, ticket, > > > rpcTimeout); > > > do { > > > synchronized (connections) { > > > connection = connections.get(remoteId); > > > if (connection == null) { > > > LOG.error("poolsize: "+getPoolSize(conf)); > > > connection = new Connection(remoteId); > > > connections.put(remoteId, connection); > > > } > > > } > > > } while (!connection.addCall(call)); > > > > > > > > > It does this by using the connection id as the key to the pool. All of > > this > > > seems good except ConnectionId never hashes to the same value so it > > cannot > > > reuse any connection. > > > > > > From my understanding of the code here is why. > > > > > > In HBaseClient.ConnectionId > > > > > > @Override > > > public boolean equals(Object obj) { > > > if (obj instanceof ConnectionId) { > > > ConnectionId id = (ConnectionId) obj; > > > return address.equals(id.address) && protocol == id.protocol && > > > ((ticket != null && ticket.equals(id.ticket)) || > > > (ticket == id.ticket)) && rpcTimeout == id.rpcTimeout; > > > } > > > return false; > > > } > > > > > > @Override // simply use the default Object#hashcode() ? > > > public int hashCode() { > > > return (address.hashCode() + PRIME * ( > > > PRIME * System.identityHashCode(protocol) ^ > > > (ticket == null ? 0 : ticket.hashCode()) )) ^ rpcTimeout; > > > } > > > > > > It uses the protocol and the ticket in the both functions. However > going > > > back through all of the layers I think I found the problem. > > > > > > Problem: > > > > > > HBaseRPC.java: public static VersionedProtocol getProxy(Class extends > > > VersionedProtocol> protocol, > > > long clientVersion, InetSocketAddress addr, Configuration conf, > > > SocketFactory factory, int rpcTimeout) throws IOException { > > > return getProxy(protocol, clientVersion, addr, > > > User.getCurrent(), conf, factory, rpcTimeout); > > > } > > > > > > User.getCurrent() always returns a new User object. That user instance > > is > > > eventually passed down to ConnectionId. However the User object > doesn't > > > implement hash() or equals() so one ConnectionId won't ever match > another > > > ConnectionId. > > > > > > > > > There are several possible solutions. > > > 1. implement hashCode and equals for the User. > > > 2. only create one User object and reuse it. > > > 3. don't look at ticket in ConnectionId (probably a bad idea) > > > > > > > > > Thoughts? Has anyone else noticed this behavior? Should I open up a > > jira > > > issue? > > > > > > I originally ran into the problem due to OS X having a limited number > of > > > threads per user (and I was not able to increase the limit) and our > unit > > > tests making requests quick enough that I ran out of threads. I tried > > out > > > all three solutions and it worked fine for my application. However I'm > > not > > > sure what changing the behavior would do to other's applications > > especially > > > those that use SecureHadoop. > > > > > > > > > > > > Thanks, > > > ~Jeff > > > > > > -- > > > Jeff Whiting > > > Qualtrics Senior Software Engineer > > > jeffw@qualtrics.com > > > > > > -- Jeff Whiting Qualtrics Senior Software Engineer jeffw@qualtrics.com --e89a8f83a6afefa82e04d9157e93--