Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates
 209.85.128.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <512210E7.7040004@agape.hanyang.ac.kr>
References: <51220E58.8090803@agape.hanyang.ac.kr>
 <512210E7.7040004@agape.hanyang.ac.kr>
From: Mohammad Tariq <dontariq@gmail.com>
Date: Mon, 18 Feb 2013 17:39:57 +0530
Message-ID: 
 <CAMVC6RPN6TPyhj6Ty5fVU_qd-03N9rA5LmutzJcpxXSMBr5CEw@mail.gmail.com>
Subject: Re: Database insertion by HAdoop
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=bcaec5040b9adfe6e004d5fe9cc6

--bcaec5040b9adfe6e004d5fe9cc6
Content-Type: text/plain; charset=ISO-8859-1

Hello Masoud,

          You can use the Bulk Load feature. You might find it more
efficient than normal client APIs or using the TableOutputFormat.

The bulk load feature uses a MapReduce job to output table data
in HBase's internal data format, and then directly loads the
generated StoreFiles into a running cluster. Using bulk load will use
less CPU and network resources than simply using the HBase API.

For a detailed info you can go here :
http://hbase.apache.org/book/arch.bulk.load.html

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Mon, Feb 18, 2013 at 5:00 PM, Masoud <masoud@agape.hanyang.ac.kr> wrote:

>
> Dear All,
>
> We are going to do our experiment of a scientific papers, ]
> We must insert data in our database for later consideration, it almost
> 300 tables each one has 2/000/000 records.
> as you know It takes lots of time to do it with a single machine,
> we are going to use our Hadoop cluster (32 machines) and divide 300
> insertion tasks between them,
> I need some hint to progress faster,
> 1- as i know we dont need to Reduser, just Mapper in enough.
> 2- so wee need just implement Mapper class with needed code.
>
> Please let me know if there is any point,
>
> Best Regards
> Masoud
>
>
>
>

--bcaec5040b9adfe6e004d5fe9cc6
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello Masoud,<div>=A0</div><div style>=A0 =A0 =A0 =A0 =A0 =
You can use the Bulk Load feature. You might find it=A0more=A0</div><div st=
yle>efficient than=A0normal client APIs or using the=A0TableOutputFormat.</=
div><div style><br>

</div><div style>The bulk load feature uses a MapReduce job to output table=
 data=A0</div><div style>in HBase&#39;s internal data format, and then dire=
ctly loads the=A0</div><div style>generated StoreFiles into a running clust=
er. Using bulk load will use=A0</div>

<div style>less CPU and network resources than simply using the HBase API.<=
br></div><div style><br></div><div style>For a detailed info you can go her=
e :=A0</div><div style><a href=3D"http://hbase.apache.org/book/arch.bulk.lo=
ad.html">http://hbase.apache.org/book/arch.bulk.load.html</a></div>

</div><div class=3D"gmail_extra"><br clear=3D"all"><div><div dir=3D"ltr">Wa=
rm Regards,<div>Tariq</div><div><a href=3D"https://mtariq.jux.com/" target=
=3D"_blank">https://mtariq.jux.com/</a><br></div><div><a href=3D"http://clo=
udfront.blogspot.com" target=3D"_blank">cloudfront.blogspot.com</a><br>

</div></div></div>
<br><br><div class=3D"gmail_quote">On Mon, Feb 18, 2013 at 5:00 PM, Masoud =
<span dir=3D"ltr">&lt;<a href=3D"mailto:masoud@agape.hanyang.ac.kr" target=
=3D"_blank">masoud@agape.hanyang.ac.kr</a>&gt;</span> wrote:<br><blockquote=
 class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc soli=
d;padding-left:1ex">

<br>
Dear All,<br>
<br>
We are going to do our experiment of a scientific papers, ]<br>
We must insert data in our database for later consideration, it almost<br>
300 tables each one has 2/000/000 records.<br>
as you know It takes lots of time to do it with a single machine,<br>
we are going to use our Hadoop cluster (32 machines) and divide 300<br>
insertion tasks between them,<br>
I need some hint to progress faster,<br>
1- as i know we dont need to Reduser, just Mapper in enough.<br>
2- so wee need just implement Mapper class with needed code.<br>
<br>
Please let me know if there is any point,<br>
<br>
Best Regards<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Masoud<br>
<br>
<br>
<br>
</font></span></blockquote></div><br></div>

--bcaec5040b9adfe6e004d5fe9cc6--