hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: Database insertion by HAdoop
Date Mon, 18 Feb 2013 12:09:57 GMT
Hello Masoud,

          You can use the Bulk Load feature. You might find it more
efficient than normal client APIs or using the TableOutputFormat.

The bulk load feature uses a MapReduce job to output table data
in HBase's internal data format, and then directly loads the
generated StoreFiles into a running cluster. Using bulk load will use
less CPU and network resources than simply using the HBase API.

For a detailed info you can go here :
http://hbase.apache.org/book/arch.bulk.load.html

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Mon, Feb 18, 2013 at 5:00 PM, Masoud <masoud@agape.hanyang.ac.kr> wrote:

>
> Dear All,
>
> We are going to do our experiment of a scientific papers, ]
> We must insert data in our database for later consideration, it almost
> 300 tables each one has 2/000/000 records.
> as you know It takes lots of time to do it with a single machine,
> we are going to use our Hadoop cluster (32 machines) and divide 300
> insertion tasks between them,
> I need some hint to progress faster,
> 1- as i know we dont need to Reduser, just Mapper in enough.
> 2- so wee need just implement Mapper class with needed code.
>
> Please let me know if there is any point,
>
> Best Regards
> Masoud
>
>
>
>

Mime
View raw message