hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Does 'online region merge' make regions unavailable for some time?
Date Thu, 22 Jan 2015 03:09:00 GMT
Thanks Enis & Ted!
A few more questions inline.

On Wed, Jan 21, 2015 at 9:53 PM, Enis Söztutar <enis.soz@gmail.com> wrote:

> Online in this context is HBase cluster being online, not individual
> regions. For the merge process, the regions go briefly offline similar to
> how splits work. It should be on the order of seconds.
>

Hm, but how could it be so quick?  Aren't regions first offlined and then
one of them is *moved*?  Or maybe data is not actually sent over the
network?

But if 2 regions are being merged, doesn't that mean that a completely new
region needs to be written (over the network, to disk, and then HDFS
replication also needs to take place).  If regions are a few GB in size,
can that really be done in a matter of seconds?

What happens to the (in flight) writes or reads going to the regions that
are being merged?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



> On Wed, Jan 21, 2015 at 10:26 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Please take a look at slides 5 and 6 in this file:
> >
> >
> https://issues.apache.org/jira/secure/attachment/12561887/merge%20region.pdf
> >
> > It is clear that the two regions to be merged are taken offline in step
> 1.
> >
> > Cheers
> >
> > On Tue, Jan 20, 2015 at 5:26 PM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Considering this is called the *online* region merge, I would assume
> > > regions being merged never go offline during the merge and both regions
> > > being merged are available for reading and writing at all times, even
> > > during the merge.... though I don't get how writes would work if one
> > region
> > > is being moved from one RS to another.... so maybe this is not truly
> > online
> > > and writes are either rejected or buffered/blocked until the region is
> > > moved AND merged?  Anyone knows for sure?
> > >
> > > I see this in one of the comments:
> > > Q: If one (or both) of the regions were receiving non-trivial load
> prior
> > to
> > > this action, would client(s) be affected ?
> > > A: Yes, region would be off services in a short time, it is equal with
> > > moving region, e.g balance a region
> > >
> > > Also took a look at the patch:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12574965/hbase-7403-trunkv33.patch
> > >
> > > And see:
> > >
> > > +    /**
> > > +     * The merging region A has been taken out of the server's online
> > > regions list.
> > > +     */
> > > +    OFFLINED_REGION_A,
> > >
> > >
> > > ... and if you look for the word "offline" in the patch I think it's
> > > pretty clear that BOTH regions being merged do go offline at some
> > > point.  I guess it could be after the merge, too, not before....
> > >
> > > ... maybe others know?
> > >
> > >
> > > Thanks,
> > > Otis
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > > On Mon, Jan 19, 2015 at 4:17 AM, Vladimir Tretyakov <
> > > vladimir.tretyakov@sematext.com> wrote:
> > >
> > > > Hi, I have one question about 'online region merge' (
> > > > https://issues.apache.org/jira/browse/HBASE-7403).
> > > > How I've understood regions which will be passed to merge method will
> > be
> > > > unavailable for some time.
> > > >
> > > > That means:
> > > > 1. Some data will be unavailable some time.
> > > > 2. If client will try to write data to these regions it will get
> > > > exceptions.
> > > >
> > > > Are above sentences correct?
> > > >
> > > > Somebody can estimate time which 1 and 2 will be true? Seconds,
> minutes
> > > or
> > > > hours? Is there any way to avoid 1 and 2?
> > > >
> > > > I am asking because now we have problem during time with number of
> > > regions
> > > > (our key contains timestamp), count of regions growing constantly
> > > > (splitting) and it become a cause of performance problem with time.
> > > > For avoiding this effect we use 2 tables:
> > > > 1. First table we use for writing and reading data.
> > > > 2. Second we use only for reading data.
> > > >
> > > > After some time we truncate second table and rotate these tables
> (first
> > > > become second and second become first). That allow us control count
> of
> > > > regions, but solution looks a bit ugly, I looked at 'online region
> > > merge',
> > > > but we can't live with restrictions I've described in first part of
> > > > question.
> > > >
> > > > Can somebody help with answers?
> > > >
> > > > Thx, Vladimir Tretyakov.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message