Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8EAC3EC5C for ; Mon, 18 Feb 2013 12:11:18 +0000 (UTC) Received: (qmail 90301 invoked by uid 500); 18 Feb 2013 12:11:13 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 89708 invoked by uid 500); 18 Feb 2013 12:11:08 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 89682 invoked by uid 99); 18 Feb 2013 12:11:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2013 12:11:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates 209.85.128.170 as permitted sender) Received: from [209.85.128.170] (HELO mail-ve0-f170.google.com) (209.85.128.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2013 12:11:01 +0000 Received: by mail-ve0-f170.google.com with SMTP id 14so4754611vea.15 for ; Mon, 18 Feb 2013 04:10:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=NtgRLc8wkDu5HZWxDtiounQVdG8okVkl057kgdDilZU=; b=VOPAQoYgd/SAs0128Wv9ldTfOKXpdT3DBEm91rG2zScqYk1GWyJ1M6x4Y3hJoBTban v53z6G8GIllCt1ofQ83n2knaQmdKpr29qMcjKgAFVAV26Jc9QnoOVbuZB1rWIiA1VwLB MusoB/Q+TBCgFuSKJ1QPaT4ASv38aivIbee0oY7ZSvJB6ziDvFnYgG5vc5tZp5CsbmTk b5l6OBq01o1M8CroQtbtAq9KBBKFxpS9SIyPC48NPwcrJM9//WwjKZP1fmrv4ynYZCow /G4NNFoyrNH+ZxMmO6uf0IyZk6FvlnaWdbCMAjr5+LEmp/hqO19RweWaTI0XTV9B4lhF Zx9Q== X-Received: by 10.52.19.65 with SMTP id c1mr13394259vde.36.1361189437171; Mon, 18 Feb 2013 04:10:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.59.8.227 with HTTP; Mon, 18 Feb 2013 04:09:57 -0800 (PST) In-Reply-To: <512210E7.7040004@agape.hanyang.ac.kr> References: <51220E58.8090803@agape.hanyang.ac.kr> <512210E7.7040004@agape.hanyang.ac.kr> From: Mohammad Tariq Date: Mon, 18 Feb 2013 17:39:57 +0530 Message-ID: Subject: Re: Database insertion by HAdoop To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=bcaec5040b9adfe6e004d5fe9cc6 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5040b9adfe6e004d5fe9cc6 Content-Type: text/plain; charset=ISO-8859-1 Hello Masoud, You can use the Bulk Load feature. You might find it more efficient than normal client APIs or using the TableOutputFormat. The bulk load feature uses a MapReduce job to output table data in HBase's internal data format, and then directly loads the generated StoreFiles into a running cluster. Using bulk load will use less CPU and network resources than simply using the HBase API. For a detailed info you can go here : http://hbase.apache.org/book/arch.bulk.load.html Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Feb 18, 2013 at 5:00 PM, Masoud wrote: > > Dear All, > > We are going to do our experiment of a scientific papers, ] > We must insert data in our database for later consideration, it almost > 300 tables each one has 2/000/000 records. > as you know It takes lots of time to do it with a single machine, > we are going to use our Hadoop cluster (32 machines) and divide 300 > insertion tasks between them, > I need some hint to progress faster, > 1- as i know we dont need to Reduser, just Mapper in enough. > 2- so wee need just implement Mapper class with needed code. > > Please let me know if there is any point, > > Best Regards > Masoud > > > > --bcaec5040b9adfe6e004d5fe9cc6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello Masoud,
=A0
=A0 =A0 =A0 =A0 =A0 = You can use the Bulk Load feature. You might find it=A0more=A0
efficient than=A0normal client APIs or using the=A0TableOutputFormat.

The bulk load feature uses a MapReduce job to output table= data=A0
in HBase's internal data format, and then dire= ctly loads the=A0
generated StoreFiles into a running clust= er. Using bulk load will use=A0
less CPU and network resources than simply using the HBase API.<= br>

For a detailed info you can go her= e :=A0



On Mon, Feb 18, 2013 at 5:00 PM, Masoud = <masoud@agape.hanyang.ac.kr> wrote:

Dear All,

We are going to do our experiment of a scientific papers, ]
We must insert data in our database for later consideration, it almost
300 tables each one has 2/000/000 records.
as you know It takes lots of time to do it with a single machine,
we are going to use our Hadoop cluster (32 machines) and divide 300
insertion tasks between them,
I need some hint to progress faster,
1- as i know we dont need to Reduser, just Mapper in enough.
2- so wee need just implement Mapper class with needed code.

Please let me know if there is any point,

Best Regards
Masoud




--bcaec5040b9adfe6e004d5fe9cc6--