hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Does 'online region merge' make regions unavailable for some time?
Date Thu, 22 Jan 2015 03:26:15 GMT
bq. doesn't that mean that a completely new region needs to be written

Yes, a new region (C in the pdf) would be created.

bq. If regions are a few GB in size

The data files from both regions are moved to the merged region's directory.

bq. (in flight) writes or reads going to the regions that are being merged?

The above operations have to wait for merged region to be assigned.

Cheers

On Wed, Jan 21, 2015 at 7:09 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Thanks Enis & Ted!
> A few more questions inline.
>
> On Wed, Jan 21, 2015 at 9:53 PM, Enis Söztutar <enis.soz@gmail.com> wrote:
>
> > Online in this context is HBase cluster being online, not individual
> > regions. For the merge process, the regions go briefly offline similar to
> > how splits work. It should be on the order of seconds.
> >
>
> Hm, but how could it be so quick?  Aren't regions first offlined and then
> one of them is *moved*?  Or maybe data is not actually sent over the
> network?
>
> But if 2 regions are being merged, doesn't that mean that a completely new
> region needs to be written (over the network, to disk, and then HDFS
> replication also needs to take place).  If regions are a few GB in size,
> can that really be done in a matter of seconds?
>
> What happens to the (in flight) writes or reads going to the regions that
> are being merged?
>
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> > On Wed, Jan 21, 2015 at 10:26 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Please take a look at slides 5 and 6 in this file:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12561887/merge%20region.pdf
> > >
> > > It is clear that the two regions to be merged are taken offline in step
> > 1.
> > >
> > > Cheers
> > >
> > > On Tue, Jan 20, 2015 at 5:26 PM, Otis Gospodnetic <
> > > otis.gospodnetic@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Considering this is called the *online* region merge, I would assume
> > > > regions being merged never go offline during the merge and both
> regions
> > > > being merged are available for reading and writing at all times, even
> > > > during the merge.... though I don't get how writes would work if one
> > > region
> > > > is being moved from one RS to another.... so maybe this is not truly
> > > online
> > > > and writes are either rejected or buffered/blocked until the region
> is
> > > > moved AND merged?  Anyone knows for sure?
> > > >
> > > > I see this in one of the comments:
> > > > Q: If one (or both) of the regions were receiving non-trivial load
> > prior
> > > to
> > > > this action, would client(s) be affected ?
> > > > A: Yes, region would be off services in a short time, it is equal
> with
> > > > moving region, e.g balance a region
> > > >
> > > > Also took a look at the patch:
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12574965/hbase-7403-trunkv33.patch
> > > >
> > > > And see:
> > > >
> > > > +    /**
> > > > +     * The merging region A has been taken out of the server's
> online
> > > > regions list.
> > > > +     */
> > > > +    OFFLINED_REGION_A,
> > > >
> > > >
> > > > ... and if you look for the word "offline" in the patch I think it's
> > > > pretty clear that BOTH regions being merged do go offline at some
> > > > point.  I guess it could be after the merge, too, not before....
> > > >
> > > > ... maybe others know?
> > > >
> > > >
> > > > Thanks,
> > > > Otis
> > > > --
> > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> Management
> > > > Solr & Elasticsearch Support * http://sematext.com/
> > > >
> > > >
> > > > On Mon, Jan 19, 2015 at 4:17 AM, Vladimir Tretyakov <
> > > > vladimir.tretyakov@sematext.com> wrote:
> > > >
> > > > > Hi, I have one question about 'online region merge' (
> > > > > https://issues.apache.org/jira/browse/HBASE-7403).
> > > > > How I've understood regions which will be passed to merge method
> will
> > > be
> > > > > unavailable for some time.
> > > > >
> > > > > That means:
> > > > > 1. Some data will be unavailable some time.
> > > > > 2. If client will try to write data to these regions it will get
> > > > > exceptions.
> > > > >
> > > > > Are above sentences correct?
> > > > >
> > > > > Somebody can estimate time which 1 and 2 will be true? Seconds,
> > minutes
> > > > or
> > > > > hours? Is there any way to avoid 1 and 2?
> > > > >
> > > > > I am asking because now we have problem during time with number of
> > > > regions
> > > > > (our key contains timestamp), count of regions growing constantly
> > > > > (splitting) and it become a cause of performance problem with time.
> > > > > For avoiding this effect we use 2 tables:
> > > > > 1. First table we use for writing and reading data.
> > > > > 2. Second we use only for reading data.
> > > > >
> > > > > After some time we truncate second table and rotate these tables
> > (first
> > > > > become second and second become first). That allow us control count
> > of
> > > > > regions, but solution looks a bit ugly, I looked at 'online region
> > > > merge',
> > > > > but we can't live with restrictions I've described in first part
of
> > > > > question.
> > > > >
> > > > > Can somebody help with answers?
> > > > >
> > > > > Thx, Vladimir Tretyakov.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message