hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: BulkLoad 200GB table with one region. Is it OK?
Date Thu, 02 Oct 2014 19:08:34 GMT
Hi Serega,

Bulk load just "push" the file into an HBase region, so there should not be
any issue. Split however might take some time because HBase will have to
split it again and again util it become small enough. So if you max file
size is 10GB, it will split it to 100GB then 50GB then 25GB then 12GB then
6GB... Each time, everything will be re-written. a LOT of wasted IOs.

So response is: Yes, HBase can handle BUT it's not a good practice. Better
to split the table before and generate the bulk based on the splited
regions. Also, it might affect the others tables and the performances
because HBase will have to do massive IOs, which at the end might impact
the performances.

JM

2014-10-02 15:03 GMT-04:00 Serega Sheypak <serega.sheypak@gmail.com>:

> Hi, I'm doing HBase bulk load to an empty table.
> Input data size is 200GB
> Is it OK to load data into one default region and then wait while HBase
> splits 200GB region?
>
> I don't have any SLA for initial load. I can wait unitl HBase splits
> initial load files.
> This table is READ only.
>
> The only conideration is not affect others tables and do not cause HBase
> cluster degradation.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message