hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Major Compaction Strategy
Date Fri, 29 Apr 2016 23:40:03 GMT
It is only because we prepend a good quality hash to our incoming keys to
get more even distribution and avoid hotspots, plus huge amounts of traffic.

----
Saad


On Fri, Apr 29, 2016 at 6:52 PM, Frank Luo <jluo@merkleinc.com> wrote:

> I have to say you have an extremely good design to have all your 7000
> regions hot at all time.
>
> In my world, due to nature of data, we always have 10 to 20% of regions
> being idle for a bit of time.
>
> To be precise, the program gets R/W counts on a region, wait for one
> minutes, then gets the counts again. If unchanged, then the region is
> considered idle.
>
> -----Original Message-----
> From: Saad Mufti [mailto:saad.mufti@gmail.com]
> Sent: Friday, April 29, 2016 5:37 PM
> To: user@hbase.apache.org
> Subject: Re: Major Compaction Strategy
>
> Unfortunately all our tables and regions are active 24/7.  Traffic does
> fall some at night but there is no real downtime.
>
> It is not user facing load though so we could I guess turn off traffic for
> a while as data queues up in Kafka. But not too long as then we're playing
> catch up.
>
> ----
> Saad
>
> On Friday, April 29, 2016, Frank Luo <jluo@merkleinc.com> wrote:
>
> > Saad,
> >
> > Will all your tables/regions be used 24/7, or at any time, just a part
> > of regions used and others are running ideal?
> >
> > If latter, I developed a tool to launch major-compact in a "smart"
> > way, because I am facing a similar issue.
> > https://github.com/jinyeluo/smarthbasecompactor.
> >
> > It looks at every RegionServer, and find non-hot regions with most
> > store files and starts compacting. It just continue going until time
> > is up. Just to be clear, it doesn't perform MC itself, which is a
> > scary thing to do, but tell region servers to do MC.
> >
> > We have it running in our cluster for about 10 hours a day and it has
> > virtually no impact to applications and the cluster is doing far
> > better than when using default scheduled MC.
> >
> >
> > -----Original Message-----
> > From: Saad Mufti [mailto:saad.mufti@gmail.com <javascript:;>]
> > Sent: Friday, April 29, 2016 1:51 PM
> > To: user@hbase.apache.org <javascript:;>
> > Subject: Re: Major Compaction Strategy
> >
> > We have more issues now, after testing this in dev, in our production
> > cluster which has tons of data (60 regions servers and around 7000
> > regions), we tried to do rolling compaction and most regions that were
> > around 6-7 GB n size were taking 4-5 minutes to finish. Based on this
> > we estimated it would take something like 20 days for a single run to
> > finish, which doesn't seem reasonable.
> >
> > So is it more reasonable to aim for doing major compaction across all
> > region servers at once but within a RS one region at a time? That
> > would cut it down to around 8 hours which is still very long. Or is it
> > better to compact all regions on one region server, then move to the
> next?
> >
> > The goal of all this is to maintain decent write performance while
> > still doing compaction. We don't have a good very low load period for
> > our cluster so trying to find a way to do this without cluster downtime.
> >
> > Thanks.
> >
> > ----
> > Saad
> >
> >
> > On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti <saad.mufti@gmail.com
> > <javascript:;>> wrote:
> >
> > > Thanks for the pointer. Working like a charm.
> > >
> > > ----
> > > Saad
> > >
> > >
> > > On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu <yuzhihong@gmail.com
> > <javascript:;>> wrote:
> > >
> > >> Please use the following method of HBaseAdmin:
> > >>
> > >>   public CompactionState getCompactionStateForRegion(final byte[]
> > >> regionName)
> > >>
> > >> Cheers
> > >>
> > >> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti <saad.mufti@gmail.com
> > <javascript:;>>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > We have a large HBase 1.x cluster in AWS and have disabled
> > >> > automatic
> > >> major
> > >> > compaction as advised. We were running our own code for
> > >> > compaction daily around midnight which calls
> > >> > HBaseAdmin.majorCompactRegion(byte[]
> > >> > regionName) in a rolling fashion across all regions.
> > >> >
> > >> > But we missed the fact that this is an asynchronous operation, so
> > >> > in practice this causes major compaction to run across all
> > >> > regions, at
> > >> least
> > >> > those not already major compacted (for example because previous
> > >> > minor compactions got upgraded to major ones).
> > >> >
> > >> > We don't really have a suitable low load period, so what is a
> > >> > suitable
> > >> way
> > >> > to make major compaction run in a rolling fashion region by region?
> > >> > The
> > >> API
> > >> > above provides no return value for us to be able to wait for one
> > >> compaction
> > >> > to finish before moving to the next.
> > >> >
> > >> > Thanks.
> > >> >
> > >> > ----
> > >> > Saad
> > >> >
> > >>
> > >
> > >
> > This email and any attachments transmitted with it are intended for
> > use by the intended recipient(s) only. If you have received this email
> > in error, please notify the sender immediately and then delete it. If
> > you are not the intended recipient, you must not keep, use, disclose,
> > copy or distribute this email without the author’s prior permission.
> > We take precautions to minimize the risk of transmitting software
> > viruses, but we advise you to perform your own virus checks on any
> > attachment to this message. We cannot accept liability for any loss or
> > damage caused by software viruses. The information contained in this
> > communication may be confidential and may be subject to the
> attorney-client privilege.
> >
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message