hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: BulkLoad 200GB table with one region. Is it OK?
Date Thu, 02 Oct 2014 19:24:18 GMT
Says that splitting is just a placing 'reference' file.
Why there sould be massive splitting?

2014-10-02 23:08 GMT+04:00 Jean-Marc Spaggiari <jean-marc@spaggiari.org>:

> Hi Serega,
> Bulk load just "push" the file into an HBase region, so there should not be
> any issue. Split however might take some time because HBase will have to
> split it again and again util it become small enough. So if you max file
> size is 10GB, it will split it to 100GB then 50GB then 25GB then 12GB then
> 6GB... Each time, everything will be re-written. a LOT of wasted IOs.
> So response is: Yes, HBase can handle BUT it's not a good practice. Better
> to split the table before and generate the bulk based on the splited
> regions. Also, it might affect the others tables and the performances
> because HBase will have to do massive IOs, which at the end might impact
> the performances.
> JM
> 2014-10-02 15:03 GMT-04:00 Serega Sheypak <serega.sheypak@gmail.com>:
> > Hi, I'm doing HBase bulk load to an empty table.
> > Input data size is 200GB
> > Is it OK to load data into one default region and then wait while HBase
> > splits 200GB region?
> >
> > I don't have any SLA for initial load. I can wait unitl HBase splits
> > initial load files.
> > This table is READ only.
> >
> > The only conideration is not affect others tables and do not cause HBase
> > cluster degradation.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message