Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5741E137 for ; Mon, 18 Feb 2013 16:57:55 +0000 (UTC) Received: (qmail 16460 invoked by uid 500); 18 Feb 2013 16:57:50 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 16378 invoked by uid 500); 18 Feb 2013 16:57:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 16371 invoked by uid 99); 18 Feb 2013 16:57:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2013 16:57:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.79 as permitted sender) Received: from [65.55.111.79] (HELO blu0-omc2-s4.blu0.hotmail.com) (65.55.111.79) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Feb 2013 16:57:44 +0000 Received: from BLU0-SMTP206 ([65.55.111.73]) by blu0-omc2-s4.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Mon, 18 Feb 2013 08:57:23 -0800 X-EIP: [5vSTf24YCzNqpku0y3Nks/oFj8FDwTP5] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [10.1.10.10] ([173.15.87.38]) by BLU0-SMTP206.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Mon, 18 Feb 2013 08:57:22 -0800 From: Michael Segel Content-Type: multipart/alternative; boundary="Apple-Mail=_533A6E6F-43EE-4F49-921C-071DF73ABECB" MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Database insertion by HAdoop Date: Mon, 18 Feb 2013 10:57:20 -0600 References: <51220E58.8090803@agape.hanyang.ac.kr> <512210E7.7040004@agape.hanyang.ac.kr> To: user@hadoop.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-OriginalArrivalTime: 18 Feb 2013 16:57:22.0154 (UTC) FILETIME=[04F584A0:01CE0DF9] X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_533A6E6F-43EE-4F49-921C-071DF73ABECB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" Nope HBase wasn't mentioned.=20 The OP could be talking about using external tables and Hive.=20 The OP could still be stuck in the RDBMs world and hasn't flattened his = data yet.=20 2 million records? Kinda small dontcha think?=20 Not Enough Information ... On Feb 18, 2013, at 8:58 AM, Hemanth Yamijala = wrote: > What database is this ? Was hbase mentioned ? >=20 > On Monday, February 18, 2013, Mohammad Tariq wrote: > Hello Masoud, > =20 > You can use the Bulk Load feature. You might find it more=20 > efficient than normal client APIs or using the TableOutputFormat. >=20 > The bulk load feature uses a MapReduce job to output table data=20 > in HBase's internal data format, and then directly loads the=20 > generated StoreFiles into a running cluster. Using bulk load will use=20= > less CPU and network resources than simply using the HBase API. >=20 > For a detailed info you can go here :=20 > http://hbase.apache.org/book/arch.bulk.load.html >=20 > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com >=20 >=20 > On Mon, Feb 18, 2013 at 5:00 PM, Masoud = wrote: >=20 > Dear All, >=20 > We are going to do our experiment of a scientific papers, ] > We must insert data in our database for later consideration, it almost > 300 tables each one has 2/000/000 records. > as you know It takes lots of time to do it with a single machine, > we are going to use our Hadoop cluster (32 machines) and divide 300 > insertion tasks between them, > I need some hint to progress faster, > 1- as i know we dont need to Reduser, just Mapper in enough. > 2- so wee need just implement Mapper class with needed code. >=20 > Please let me know if there is any point, >=20 > Best Regards > Masoud >=20 >=20 >=20 >=20 --Apple-Mail=_533A6E6F-43EE-4F49-921C-071DF73ABECB Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="iso-8859-1" Nope = HBase wasn't mentioned. 
The OP could be talking about using = external tables and Hive. 

The OP could = still be stuck in the RDBMs world and hasn't flattened his data = yet. 
2 million records? Kinda small dontcha = think? 


Not Enough = Information ...

On Feb 18, 2013, at 8:58 AM, = Hemanth Yamijala <yhemanth@thoughtworks.com>= ; wrote:

What database is this ? Was hbase = mentioned ?

On Monday, February 18, 2013, Mohammad Tariq = wrote:
Hello Masoud,
 
      =     You can use the Bulk Load feature. You might find = it more 
efficient than normal client APIs or = using the TableOutputFormat.

The bulk load feature uses a MapReduce job to output table = data 
in HBase's internal data format, and then directly = loads the 
generated StoreFiles into a running cluster. = Using bulk load will use 
less CPU and network resources than simply using the HBase = API.

For a detailed info you can go here = : 



On Mon, Feb 18, 2013 at 5:00 PM, = Masoud <masoud@agape.hanyang.ac.kr> wrote:

Dear All,

We are going to do our experiment of a scientific papers, ]
We must insert data in our database for later consideration, it = almost
300 tables each one has 2/000/000 records.
as you know It takes lots of time to do it with a single machine,
we are going to use our Hadoop cluster (32 machines) and divide 300
insertion tasks between them,
I need some hint to progress faster,
1- as i know we dont need to Reduser, just Mapper in enough.
2- so wee need just implement Mapper class with needed code.

Please let me know if there is any point,

Best Regards
Masoud





= --Apple-Mail=_533A6E6F-43EE-4F49-921C-071DF73ABECB--