hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: HBase 2 ,bulk import question
Date Thu, 18 Jul 2019 16:32:49 GMT
One think to add, when you will bulkload your files, if needed, they will
be split according to the regions boundaries.

Because between when you start your job and when you push your files, there
might have been some "natural" splits on the table side, the bulkloader has
to be able to re-split your generated data.

JMS

Le jeu. 18 juil. 2019 à 09:55, OpenInx <openinx@gmail.com> a écrit :

> Austin is right. The pre-splitting is mainly used for generate&load HFiles,
> say
> when do bulkload, it will load each generated hfile to the corresponding
> region
> who include the rowkey interval of the hfile. If no pre-splitting, then all
> HFiles
> will be in one region, bulkload will be time-consuming and it's easy to be
> hotspot
> when query coming in.
>
> About the demo, you can see here:
> [1]. https://hbase.apache.org/book.html#arch.bulk.load
> [2].
>
> http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
>
> Thanks.
>
> On Thu, Jul 18, 2019 at 9:21 PM Austin Heyne <aheyne@ccri.com> wrote:
>
> > Bulk importing requires the table the data is being bulk imported into
> > to already exists. This is because the mapreduce job needs to extract
> > the region start/end keys in order to drive the reducers. This means
> > that you need to create your table before hand, providing the
> > appropriate pre-splitting and then run your bulk ingest and bulk load to
> > get the data into the table. If you were to not pre-split your table
> > then you would end up with one reducer in your bulk ingest job. This
> > also means that your bulk ingest cluster will need to be able to
> > communicate with your HBase instance.
> >
> > -Austin
> >
> > On 7/18/19 4:39 AM, Michael wrote:
> > > Hi,
> > >
> > > I looked at the possibility of bulk importing into hbase, but somehow I
> > > don't get it. I am not able to perform a presplitting of the data, so
> > > does bulk importing work without presplitting?
> > > As I understand it, instead of putting the data, I create the hbase
> > > region files, but all tutorials I read mentioned presplitting...
> > >
> > > So, is presplitting essential for bulk importing?
> > >
> > > It would be really helpful, if someone could point me to demo
> > > implementation of a bulk import.
> > >
> > > Thanks for helping
> > >   Michael
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message