hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Slow sync cost
Date Wed, 27 Apr 2016 23:18:26 GMT
There might be a typo:

bq. After the Evacuation phase, Eden and Survivor To are devoid of live
data and reclaimed.

>From the graph below it, it seems Survivor From is reclaimed, not Survivor
To.

FYI

On Wed, Apr 27, 2016 at 7:39 AM, Bryan Beaudreault <bbeaudreault@hubspot.com
> wrote:

> We have 6 production clusters and all of them are tuned differently, so I'm
> not sure there is a setting I could easily give you. It really depends on
> the usage.  One of our devs wrote a blog post on G1GC fundamentals
> recently. It's rather long, but could be worth a read:
>
> http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection
>
> We will also have a blog post coming out in the next week or so that talks
> specifically to tuning G1GC for HBase. I can update this thread when that's
> available.
>
> On Tue, Apr 26, 2016 at 8:08 PM Saad Mufti <saad.mufti@gmail.com> wrote:
>
> > That is interesting. Would it be possible for you to share what GC
> settings
> > you ended up on that gave you the most predictable performance?
> >
> > Thanks.
> >
> > ----
> > Saad
> >
> >
> > On Tue, Apr 26, 2016 at 11:56 AM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > We were seeing this for a while with our CDH5 HBase clusters too. We
> > > eventually correlated it very closely to GC pauses. Through heavily
> > tuning
> > > our GC we were able to drastically reduce the logs, by keeping most
> GC's
> > > under 100ms.
> > >
> > > On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti <saad.mufti@gmail.com>
> wrote:
> > >
> > > > From what I can see in the source code, the default is actually even
> > > lower
> > > > at 100 ms (can be overridden with
> hbase.regionserver.hlog.slowsync.ms
> > ).
> > > >
> > > > ----
> > > > Saad
> > > >
> > > >
> > > > On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowling <
> > kevin.bowling@kev009.com
> > > >
> > > > wrote:
> > > >
> > > > > I see similar log spam while system has reasonable performance.
> Was
> > > the
> > > > > 250ms default chosen with SSDs and 10ge in mind or something?  I
> > guess
> > > > I'm
> > > > > surprised a sync write several times through JVMs to 2 remote
> > datanodes
> > > > > would be expected to consistently happen that fast.
> > > > >
> > > > > Regards,
> > > > >
> > > > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti <saad.mufti@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're
> > constantly
> > > > > seeing
> > > > > > the following messages in the region server logs:
> > > > > >
> > > > > > 2016-04-25 14:02:55,178 INFO
> > > > > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost:
> > 258
> > > > ms,
> > > > > > current pipeline:
> > > > > > [DatanodeInfoWithStorage[10.99.182.165:50010
> > > > > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK],
> > > > > > DatanodeInfoWithStorage[10.99.182.236:50010
> > > > > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK],
> > > > > > DatanodeInfoWithStorage[10.99.182.195:50010
> > > > > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]]
> > > > > >
> > > > > >
> > > > > > These happen regularly while HBase appear to be operating
> normally
> > > with
> > > > > > decent read and write performance. We do have occasional
> > performance
> > > > > > problems when regions are auto-splitting, and at first I thought
> > this
> > > > was
> > > > > > related but now I se it happens all the time.
> > > > > >
> > > > > >
> > > > > > Can someone explain what this means really and should we be
> > > concerned?
> > > > I
> > > > > > tracked down the source code that outputs it in
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> > > > > >
> > > > > > but after going through the code I think I'd need to know much
> more
> > > > about
> > > > > > the code to glean anything from it or the associated JIRA ticket
> > > > > > https://issues.apache.org/jira/browse/HBASE-11240.
> > > > > >
> > > > > > Also, what is this "pipeline" the ticket and code talks about?
> > > > > >
> > > > > > Thanks in advance for any information and/or clarification anyone
> > can
> > > > > > provide.
> > > > > >
> > > > > > ----
> > > > > >
> > > > > > Saad
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message