Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: unknown (athena.apache.org: error in processing during lookup of
 uday.jarajapu@opower.com)
MIME-Version: 1.0
In-Reply-To: 
 <CAFaOAu0Xn1pgJu_1cAbYTSuaf=E3bH_oO5hA3vjFmmrbLwH2EQ@mail.gmail.com>
References: 
 <CAFaOAu2iPZ1tT6iNmJYqJj6gixZ6ojO-Vds0PJwVNzo_4pvB8A@mail.gmail.com>
	<CACjKNK6s5379nmuoZrea08=ansLZa86WcHzqKq8z7vSwLDa2Gg@mail.gmail.com>
	<CACjKNK5LkbSRs7UWfFigwP0L_Fbf+ROfLP6E2JiA5-nCdv1CdA@mail.gmail.com>
	<CAFaOAu0Xn1pgJu_1cAbYTSuaf=E3bH_oO5hA3vjFmmrbLwH2EQ@mail.gmail.com>
Date: Tue, 22 May 2012 21:40:07 -0700
Message-ID: 
 <CACjKNK41prfPS+Ucf5G4PbkakxS=JbAGTHtk1v1hTsVCVNf_8w@mail.gmail.com>
Subject: Re: Garbage collection issues
From: Uday Jarajapu <uday.jarajapu@opower.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=f46d04462e5ec6272504c0acbad1

--f46d04462e5ec6272504c0acbad1
Content-Type: text/plain; charset=ISO-8859-1

You mentioned in your email that "total data size varies between about 1 &
2K". I am guessing you meant by this that your individual record size
varies between 1 & 2K.

If that is true, there is a good chance that you might be hitting the CMS
occupancy fraction sooner than otherwise due to a varying record size.
Consider encoding as a way to limit variation in individual record sizes.
Opentsdb schema <http://opentsdb.net/schema.html> is a nice example of how
we can use encoding to accomplish the same.


On Mon, May 21, 2012 at 6:15 AM, Simon Kelly <simongdkelly@gmail.com> wrote:

> Great, thanks very much for the help. I'm going to see if I can get more
> memory into the servers and will also experiment with XX:ParallelGCThreads.
> We already have XX:CMSInitiatingOccupancyFraction=70 in the config.
>
> Uday, what do you mean by "a fixed size record"? Do you mean the record
> that is being written to Hbase?
>
>
> On 19 May 2012 12:44, Uday Jarajapu <uday.jarajapu@opower.com> wrote:
>
> > Also, try playing with
> >
> > #3)  -XX:CMSInitiatingOccupancyFraction=70 to kick off a CMS GC sooner
> > than a default trigger would.
> >
> > #4) a fixed size record to make sure you do not run into the promotion
> > failure due to fragmentation
> >
> >
> > On Fri, May 18, 2012 at 4:35 PM, Uday Jarajapu <uday.jarajapu@opower.com
> >wrote:
> >
> >> I think you have it right for the most part, except you are underarmed
> >> with only 8G and a 4-core box. Since you have Xmx=xms=4G, the default
> >> collector (parallel) with the right number of threads might be able to
> pull
> >> it off. In fact, CMS might be defaulting to that eventually.
> >>
> >> As you know, CMS is great for sweeping heap sizes in the 8G-16G range
> but
> >> it eventually defaults to parallel GC for smaller heaps that run out of
> >> space quickly. On top of that, it is non compacting. So, what works for
> a
> >> couple of cycles might quickly run out of room and leave no other choice
> >> but to stop-the-world. To avoid the hit when that happens, try limiting
> the
> >> number of parallel GC Threads to be a third of your cores. In your case,
> >> that would be 1 unfortunately. Try 1 or 2.
> >>
> >> I would recommend trying one of these two tests on the Region server:
> >>
> >> #1) -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> >> -XX:ParallelGCThreads=1 ( or 2) *
> >>
> >> *#2) -XX:ParallelGCThreads=2 *
> >> *
> >> The second test is just for giggles to see if the CMS aspect is helping
> >> you at all (or if you are ending up doing a stop-the-world more than you
> >> want. If that is the case, try using the default GC )
> >> *
> >> **
> >> *Hope that helps,
> >> Uday
> >>
> >> On Fri, May 18, 2012 at 4:54 AM, Simon Kelly <simongdkelly@gmail.com
> >wrote:
> >>
> >>> Hi
> >>>
> >>> Firstly, let me complement the Hbase team on a great piece of software.
> >>> We're running a few clusters that are working well but we're really
> >>> struggling with a new one I'm trying to setup and could use a bit of
> help.
> >>> I have read as much as I can but just can't seem to get it right.
> >>>
> >>> The difference between this cluster the others is that this one's load
> >>> is 99% writes. Each write contains about 40 columns to a single table
> and
> >>> column family and the total data size varies between about 1 & 2K. The
> load
> >>> per server varies between 20  and 90 requests per second at different
> times
> >>> of the day. The row keys are UUID's so are uniformly distributed
> across the
> >>> (currently 60) regions.
> >>>
> >>> The problem seems to be that after some time a GC cycle takes longer
> >>> that expected one of the regionservers and the master kills the
> >>> regionserver.
> >>>
> >>> This morning I ran the system up till the first regionserver failure
> and
> >>> recorded the data with Ganglia. I have attached the following ganglia
> >>> graphs:
> >>>
> >>>    - hbase.regionserver.compactionQueueSize
> >>>    - hbase.regionserver.memstoreSizeMB
> >>>    - requests_per_minute (to the service that calls hbase)
> >>>    - request_processing_time (of the service that calls hbase)
> >>>
> >>> Any assistance would be greatly appreciated. I did have GC logging on
> so
> >>> have access to all that data too.
> >>>
> >>> Best regards
> >>> Simon Kelly
> >>>
> >>> *Cluster details*
> >>> *----------------------*
> >>> Its running on 5 machines with the following specs:
> >>>
> >>>    - CPUs: 4 x 2.39 GHz
> >>>    - RAM: 8 GB
> >>>    - Ubuntu 10.04.2 LTS
> >>>
> >>> The Hadoop cluster (version 1.0.1, r1243785) is running over all the
> >>> machines that has 8TB of capacity (60% unused). On top of that is Hbase
> >>> version 0.92.1, r1298924. All the servers run Hadoop datanodes and
> Hbase
> >>> regionservers. One server hosts the Hadoop primary namenode and the
> Hbase
> >>> master. 3 servers form the Zookeeper quorum.
> >>>
> >>> The Hbase config is as follows:
> >>>
> >>>    - HBASE_OPTS="-Xmn128m -ea -XX:+UseConcMarkSweepGC
> >>>    -XX:+CMSIncrementalMode -XX:+UseParNewGC
> >>>    -XX:CMSInitiatingOccupancyFraction=70"
> >>>    - HBASE_HEAPSIZE=4096
> >>>
> >>>
> >>>    - hbase.rootdir : hdfs://server1:8020/hbase
> >>>    - hbase.cluster.distributed : true
> >>>    - hbase.zookeeper.property.clientPort : 2222
> >>>    - hbase.zookeeper.quorum : server1,server2,server3
> >>>    - zookeeper.session.timeout : 30000
> >>>    - hbase.regionserver.maxlogs : 16
> >>>    - hbase.regionserver.handler.count : 50
> >>>    - hbase.regionserver.codecs : lzo
> >>>    - hbase.master.startup.retainassign : false
> >>>    - hbase.hregion.majorcompaction : 0
> >>>
> >>> (for the benefit of those without the attachements I'll describe the
> >>> graphs:
> >>>
> >>>    - 0900 - system starts
> >>>    - 1010 - memstore reaches 1.2GB and flushes to 500MB, a few hbase
> >>>    compactions happen and a slight increase in request_processing_time
> >>>    - 1040 - memstore reaches 1.0GB and flushes to 500MB (no hbase
> >>>    compactions)
> >>>    - 1110 - memstore reaches 1.0GB and flushes to 300MB, a few more
> >>>    hbase compactions happen and a slightly larger increase in
> >>>    request_processing_time
> >>>    - 1200 - memstore reaches 1.3GB and flushes to 200MB, more hbase
> >>>    compactions and increase in request_processing_time
> >>>    - 1230 - hbase logs for server1 record: We slept 13318ms instead of
> >>>    3000ms and regionserver1 is killed by master,
> request_processing_time goes
> >>>    way up
> >>>    - 1326 - hbase logs for server3 record: We slept 77377ms instead of
> >>>    3000ms and regionserver2 is killed by master
> >>>
> >>> )
> >>>
> >>
> >>
> >
>

--f46d04462e5ec6272504c0acbad1--