hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: hbase doubts
Date Thu, 16 Jul 2015 13:48:04 GMT
My understanding is (please feel free to correct if I am wrong.):

For your first question, I think more than efficiency TableOutputFormat
provides you with a convenience of giving you a output format out of the
box which can do the Puts for you with default recommended config settings
like flush, WAL etc. You can extend it if you want and customize it surley.

As for your second question, the job creates HFiles based on regionserver.
"At this stage, one HFile will be created per region in the output folder.
Keep in mind that the input data is almost completely re-written, so you
will need at least twice the amount of disk space available than the size
of the original data set. For example, for a 100GB mysqldump you should
have at least 200GB of available disk space in HDFS. You can delete the
dump file at the end of the process."

Source & ref:
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
https://sreejithrpillai.wordpress.com/2015/01/08/bulkloading-data-into-hbase-table-using-mapreduce/

Regards,
Shahab

On Thu, Jul 16, 2015 at 9:37 AM, Shushant Arora <shushantarora09@gmail.com>
wrote:

> does bulk put supported in hbase ?
>
> And in MR job when we put in a table using TableOutputFormat how is it more
> efficient than normal put by individual reducers ? Does TableOutputformat
> not do put one by one ?
>
> And in bulkload hadoop job when we specify HFileOutputFormat , does job
> creates Hfiles based on regionserver in which they will finally land or
> just in sorted order and then Hbase utility LoadIncremental HFiles handle
> regionserver in which keys of these Hfiles will go by parsing the Hfile
> instead of just dumping the HFiles?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message