Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB192C7D6 for ; Wed, 23 May 2012 04:40:40 +0000 (UTC) Received: (qmail 90983 invoked by uid 500); 23 May 2012 04:40:38 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 90568 invoked by uid 500); 23 May 2012 04:40:36 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 90510 invoked by uid 99); 23 May 2012 04:40:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2012 04:40:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: unknown (athena.apache.org: error in processing during lookup of uday.jarajapu@opower.com) Received: from [74.125.82.51] (HELO mail-wg0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2012 04:40:29 +0000 Received: by wgbed3 with SMTP id ed3so5090070wgb.20 for ; Tue, 22 May 2012 21:40:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=vpmwbV9WPPyk2ukySWFJUC0ur4GDNtM+iEjXYPFPrNM=; b=iA1zZhgc3UpQoC3FTXfdtqYBza1wR/I+UWh9FnaurCjsdF9jHvgJ82Q+pa7wLK63GI fLcISGyVQHnQabIsLZ9kZNEEpnXVxI5CywTA8xCZo4BMs9D/tVGvOsYdXLjQqLuIBIwp IyRbxaknlIWh2vGdEX0Xd5nfedp+OPXrs9sFhJ8TWJZRyz9ZDFu/aAy6K+oewUzRLAHS RET8vqj/I08Fs//+CsU3PqzXh6ATgqdpVNt73/B2Of8Pdqu0xgluFcYkDZzyDVkAMkDb 1qQCOhdAnwDfXGmHT+UxO0XD+R5kw94eBQUhO9Qij+y3pr6K7wyOtor5XTS3NroxKsdA dhHg== MIME-Version: 1.0 Received: by 10.180.94.163 with SMTP id dd3mr9849646wib.22.1337748007298; Tue, 22 May 2012 21:40:07 -0700 (PDT) Received: by 10.180.86.170 with HTTP; Tue, 22 May 2012 21:40:07 -0700 (PDT) In-Reply-To: References: Date: Tue, 22 May 2012 21:40:07 -0700 Message-ID: Subject: Re: Garbage collection issues From: Uday Jarajapu To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d04462e5ec6272504c0acbad1 X-Gm-Message-State: ALoCoQl1FProOr7C2Aq5ic/rjMkBVkW/FNlWmRx+JWrJnqNLYBdDRtkqX+Fe5KLX3CqdXlzOh89D X-Virus-Checked: Checked by ClamAV on apache.org --f46d04462e5ec6272504c0acbad1 Content-Type: text/plain; charset=ISO-8859-1 You mentioned in your email that "total data size varies between about 1 & 2K". I am guessing you meant by this that your individual record size varies between 1 & 2K. If that is true, there is a good chance that you might be hitting the CMS occupancy fraction sooner than otherwise due to a varying record size. Consider encoding as a way to limit variation in individual record sizes. Opentsdb schema is a nice example of how we can use encoding to accomplish the same. On Mon, May 21, 2012 at 6:15 AM, Simon Kelly wrote: > Great, thanks very much for the help. I'm going to see if I can get more > memory into the servers and will also experiment with XX:ParallelGCThreads. > We already have XX:CMSInitiatingOccupancyFraction=70 in the config. > > Uday, what do you mean by "a fixed size record"? Do you mean the record > that is being written to Hbase? > > > On 19 May 2012 12:44, Uday Jarajapu wrote: > > > Also, try playing with > > > > #3) -XX:CMSInitiatingOccupancyFraction=70 to kick off a CMS GC sooner > > than a default trigger would. > > > > #4) a fixed size record to make sure you do not run into the promotion > > failure due to fragmentation > > > > > > On Fri, May 18, 2012 at 4:35 PM, Uday Jarajapu >wrote: > > > >> I think you have it right for the most part, except you are underarmed > >> with only 8G and a 4-core box. Since you have Xmx=xms=4G, the default > >> collector (parallel) with the right number of threads might be able to > pull > >> it off. In fact, CMS might be defaulting to that eventually. > >> > >> As you know, CMS is great for sweeping heap sizes in the 8G-16G range > but > >> it eventually defaults to parallel GC for smaller heaps that run out of > >> space quickly. On top of that, it is non compacting. So, what works for > a > >> couple of cycles might quickly run out of room and leave no other choice > >> but to stop-the-world. To avoid the hit when that happens, try limiting > the > >> number of parallel GC Threads to be a third of your cores. In your case, > >> that would be 1 unfortunately. Try 1 or 2. > >> > >> I would recommend trying one of these two tests on the Region server: > >> > >> #1) -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > >> -XX:ParallelGCThreads=1 ( or 2) * > >> > >> *#2) -XX:ParallelGCThreads=2 * > >> * > >> The second test is just for giggles to see if the CMS aspect is helping > >> you at all (or if you are ending up doing a stop-the-world more than you > >> want. If that is the case, try using the default GC ) > >> * > >> ** > >> *Hope that helps, > >> Uday > >> > >> On Fri, May 18, 2012 at 4:54 AM, Simon Kelly >wrote: > >> > >>> Hi > >>> > >>> Firstly, let me complement the Hbase team on a great piece of software. > >>> We're running a few clusters that are working well but we're really > >>> struggling with a new one I'm trying to setup and could use a bit of > help. > >>> I have read as much as I can but just can't seem to get it right. > >>> > >>> The difference between this cluster the others is that this one's load > >>> is 99% writes. Each write contains about 40 columns to a single table > and > >>> column family and the total data size varies between about 1 & 2K. The > load > >>> per server varies between 20 and 90 requests per second at different > times > >>> of the day. The row keys are UUID's so are uniformly distributed > across the > >>> (currently 60) regions. > >>> > >>> The problem seems to be that after some time a GC cycle takes longer > >>> that expected one of the regionservers and the master kills the > >>> regionserver. > >>> > >>> This morning I ran the system up till the first regionserver failure > and > >>> recorded the data with Ganglia. I have attached the following ganglia > >>> graphs: > >>> > >>> - hbase.regionserver.compactionQueueSize > >>> - hbase.regionserver.memstoreSizeMB > >>> - requests_per_minute (to the service that calls hbase) > >>> - request_processing_time (of the service that calls hbase) > >>> > >>> Any assistance would be greatly appreciated. I did have GC logging on > so > >>> have access to all that data too. > >>> > >>> Best regards > >>> Simon Kelly > >>> > >>> *Cluster details* > >>> *----------------------* > >>> Its running on 5 machines with the following specs: > >>> > >>> - CPUs: 4 x 2.39 GHz > >>> - RAM: 8 GB > >>> - Ubuntu 10.04.2 LTS > >>> > >>> The Hadoop cluster (version 1.0.1, r1243785) is running over all the > >>> machines that has 8TB of capacity (60% unused). On top of that is Hbase > >>> version 0.92.1, r1298924. All the servers run Hadoop datanodes and > Hbase > >>> regionservers. One server hosts the Hadoop primary namenode and the > Hbase > >>> master. 3 servers form the Zookeeper quorum. > >>> > >>> The Hbase config is as follows: > >>> > >>> - HBASE_OPTS="-Xmn128m -ea -XX:+UseConcMarkSweepGC > >>> -XX:+CMSIncrementalMode -XX:+UseParNewGC > >>> -XX:CMSInitiatingOccupancyFraction=70" > >>> - HBASE_HEAPSIZE=4096 > >>> > >>> > >>> - hbase.rootdir : hdfs://server1:8020/hbase > >>> - hbase.cluster.distributed : true > >>> - hbase.zookeeper.property.clientPort : 2222 > >>> - hbase.zookeeper.quorum : server1,server2,server3 > >>> - zookeeper.session.timeout : 30000 > >>> - hbase.regionserver.maxlogs : 16 > >>> - hbase.regionserver.handler.count : 50 > >>> - hbase.regionserver.codecs : lzo > >>> - hbase.master.startup.retainassign : false > >>> - hbase.hregion.majorcompaction : 0 > >>> > >>> (for the benefit of those without the attachements I'll describe the > >>> graphs: > >>> > >>> - 0900 - system starts > >>> - 1010 - memstore reaches 1.2GB and flushes to 500MB, a few hbase > >>> compactions happen and a slight increase in request_processing_time > >>> - 1040 - memstore reaches 1.0GB and flushes to 500MB (no hbase > >>> compactions) > >>> - 1110 - memstore reaches 1.0GB and flushes to 300MB, a few more > >>> hbase compactions happen and a slightly larger increase in > >>> request_processing_time > >>> - 1200 - memstore reaches 1.3GB and flushes to 200MB, more hbase > >>> compactions and increase in request_processing_time > >>> - 1230 - hbase logs for server1 record: We slept 13318ms instead of > >>> 3000ms and regionserver1 is killed by master, > request_processing_time goes > >>> way up > >>> - 1326 - hbase logs for server3 record: We slept 77377ms instead of > >>> 3000ms and regionserver2 is killed by master > >>> > >>> ) > >>> > >> > >> > > > --f46d04462e5ec6272504c0acbad1--