hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Region split during mapreduce
Date Sat, 01 Nov 2014 07:18:10 GMT
I do not believe that to be true.
HBase only uses Region boundaries to identify useful scan ranges during the setup of the job.
These ranges will work regardless of whether the number of regions increases later or not.
The worst case is that a single mapper might be scanning multiple regions (those that are
the result of a split of the region it was supposed to scan).
Regions are unavailable for a short time during a split, but the mappers are normal HBase
clients and so they wait out the splits by retrying.
-- Lars

      From: Flavio Pompermaier <pompermaier@okkam.it>
 To: user@hbase.apache.org 
 Sent: Friday, October 31, 2014 10:23 AM
 Subject: Re: Region split during mapreduce
   
The problem is that I don't know if what they say at that link is true or
not.
In the past I experienced several problems running mapreduce jobs on a
"live" Hbase table but I didn't know about the fact that mapreduce jobs
crash if region were splitting..
Do I have to create a snapshot if I want to use TableSnapshotInputFormat or
it automatically handles the snapshot creation and deletion of a snapshot?
Is there any detailed reference about how to deal with such event during
mapreduce jobs?

Thanks for the support,
Flavio



On Fri, Oct 31, 2014 at 6:12 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Flavio:
> Have you considered using TableSnapshotInputFormat ?
>
> See TableMapReduceUtil#initTableSnapshotMapperJob()
>
> Cheers
>
> On Fri, Oct 31, 2014 at 10:01 AM, Flavio Pompermaier <pompermaier@okkam.it
> >
> wrote:
>
> > Is there anybody here..?
> >
> > On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier <
> pompermaier@okkam.it>
> > wrote:
> >
> > > Any help about this..?
> > >
> > > On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier <
> > pompermaier@okkam.it>
> > > wrote:
> > >
> > >> Hi to all,
> > >> I was reading
> > >>
> >
> http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
> > >> and they say " still using
> > >> org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem,
> > your
> > >> job will fail when one of HBase Region for target HBase table is
> > splitting
> > >> ! because the original region will be offline by splitting".
> > >>
> > >> Is that true?
> > >> Is there a solution to that?
> > >>
> > >> Best,
> > >> Flavio
> > >>
> > >
> >
>


   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message