Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 46796 invoked from network); 11 Aug 2010 14:10:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Aug 2010 14:10:03 -0000 Received: (qmail 51062 invoked by uid 500); 11 Aug 2010 14:10:02 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 50915 invoked by uid 500); 11 Aug 2010 14:10:00 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 50907 invoked by uid 99); 11 Aug 2010 14:09:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Aug 2010 14:09:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qy0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Aug 2010 14:09:54 +0000 Received: by qyk12 with SMTP id 12so5398868qyk.14 for ; Wed, 11 Aug 2010 07:09:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=JnyiC1RpiVG+T1uYDF2r5rokovHUMT1MkI8SEEy9tq0=; b=HehU/SuWj83i80wchnF3qRnKECYkXInTCUmymaaTej2vwDCw5jyAFeYVwLi34sTVrz v3fMU27CHOura/lSVq7TZHGa8h1wuhdGSBtb3MHjvWhuT+7kTPqHYUSsuKo8vpWXyMMP M9YrPlsQvDkTB6oeQKyJ7P6qRDCqXg2KhGM+U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=OhdjnBgcQCWeXDA1xUMpOwt3I+MIXZWc0PqFFfBPhXzG/lhdwh+Nh4Wvo3O3/FUfbB xH+UYkqP+ZMc9KkKVfXFbomkPDrZEfxOwELmSXR1DYtwxEFJOGmlpOxF8DUBUxYwMEUy +oB9ION9CvrTJ8yeK6tfWqRtwEdWCRqe/ITdA= MIME-Version: 1.0 Received: by 10.229.220.20 with SMTP id hw20mr9773980qcb.94.1281535772971; Wed, 11 Aug 2010 07:09:32 -0700 (PDT) Received: by 10.229.192.208 with HTTP; Wed, 11 Aug 2010 07:09:32 -0700 (PDT) In-Reply-To: References: Date: Wed, 11 Aug 2010 07:09:32 -0700 Message-ID: Subject: Re: load balancing considerations From: Ted Yu To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=0016363b82ce83e29f048d8cccf1 --0016363b82ce83e29f048d8cccf1 Content-Type: text/plain; charset=ISO-8859-1 The client was doing this: Flushing 52428400 into On Wed, Aug 11, 2010 at 3:09 AM, Ted Yu wrote: > Here is client side stack trace: > > java.io.IOException: Call to > us01-ciqps1-grid01.carrieriq.com/10.32.42.233:60020 failed on local > exception: java.io.EOFException > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > java.net.ConnectException: Connection refused > > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1037) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1222) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1144) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1230) > at > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666) > at > com.carrieriq.m2m.platform.mmp2.input.StripedHBaseTable.flushAllStripesNew(StripedHBaseTable.java:300) > > > On Tue, Aug 10, 2010 at 11:01 PM, Ryan Rawson wrote: > >> Use a tool like Yourkit to grovel that heap, the open source tools are >> not really there yet. >> >> But your stack trace tells a lot.... the fatal allocation is in the >> RPC layer. Either a client is sending a massive value, or you have a >> semi-hostile network client sending bytes to your open socket which >> are being interpreted as the buffer size to allocate. If you look at >> the actual RPC code (any RPC code really) there is often a 'length' >> field which is then used to allocate a dynamic buffer. >> >> -ryan >> >> On Tue, Aug 10, 2010 at 10:55 PM, Ted Yu wrote: >> > The compressed file is still big: >> > -rw-r--r-- 1 hadoop users 809768340 Aug 11 05:49 java_pid26972.hprof.gz >> > >> > If you can tell me specific things to look for in the dump, I would >> collect >> > it (through jhat) and publish. >> > >> > Thanks >> > >> > On Tue, Aug 10, 2010 at 10:29 PM, Stack wrote: >> > >> >> On Tue, Aug 10, 2010 at 9:52 PM, Ted Yu wrote: >> >> > Here are GC-related parameters: >> >> > /usr/java/jdk1.6/bin/java -Xmx4000m -XX:+HeapDumpOnOutOfMemoryError >> >> > -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode >> >> > >> >> >> >> You have > 2 CPUs per machine I take it? You could probably drop the >> >> conservative XX:+CMSIncrementalMode. >> >> >> >> > The heap dump is big: >> >> > -rw------- 1 hadoop users 4146551927 Aug 11 03:59 java_pid26972.hprof >> >> > >> >> > Do you have ftp server where I can upload it ? >> >> > >> >> >> >> Not really. I was hoping you could put a compressed version under an >> >> http server somewhere that I could pull from. You might as well >> >> include the GC log while you are at it. >> >> >> >> Thanks Ted, >> >> >> >> St.Ack >> >> >> > >> > > --0016363b82ce83e29f048d8cccf1--