Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D16110F02 for ; Sun, 3 Nov 2013 16:59:14 +0000 (UTC) Received: (qmail 8863 invoked by uid 500); 3 Nov 2013 16:58:06 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 8653 invoked by uid 500); 3 Nov 2013 16:57:48 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 8603 invoked by uid 99); 3 Nov 2013 16:57:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Nov 2013 16:57:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [72.30.239.147] (HELO nm39-vm3.bullet.mail.bf1.yahoo.com) (72.30.239.147) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Nov 2013 16:57:29 +0000 Received: from [98.139.215.142] by nm39.bullet.mail.bf1.yahoo.com with NNFMP; 03 Nov 2013 16:57:08 -0000 Received: from [98.139.212.217] by tm13.bullet.mail.bf1.yahoo.com with NNFMP; 03 Nov 2013 16:57:08 -0000 Received: from [127.0.0.1] by omp1026.mail.bf1.yahoo.com with NNFMP; 03 Nov 2013 16:57:08 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 365570.7404.bm@omp1026.mail.bf1.yahoo.com Received: (qmail 78059 invoked by uid 60001); 3 Nov 2013 16:57:08 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1383497828; bh=5SJpMTxHeWxbtJ+01x4U9TBSgJWAwwJCTv6ImS9iY0Y=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=FW4KwxFp6F5n6idZ32gMuTARN2DZ8A1WSNSCbSv6HBluq9jnDZgJELTAE75KX/VeyYWTV/D47jyaiMsonoVOTBMaarfu/nMUS/IZC6hhn6KaEoi0UBNhmGJRNIeur6wODE7qPw0jehhNZkvbvE5zToa0XZCyuRzubbHXT/SFHK4= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=TJ1X9hKQy/vLf5gDJ7C6El37NKKxTi4tJoOC9BK53Z76XqWc33mdo2EWFG1D8emaZq85CaFFSKJBTOcPWQ+Uo68StoV38gM5KeueGbAezWOwutc8z1lRNL8qInoaEtLpg7oSSrwk6BAmwzXND3/UFFJTbochHR8LV3lwzZQUWmA=; X-YMail-OSG: 0zM3JYUVM1lH6TZzTw.wy9hJCDHI9wJSEaszASBKSdxUbRe d0U1Xz6JiMNR8Kj8GNlHA6O.9KOxC4aZiSBEu1jskAyhPxdsYcMJpSSNRJ.u R.GB3s7bnFyZYHc46GkRb6itLsrDrCsbWb97zhTv8TOg1t04cHMCnjD2gqGF IlclxFA6bsfXiYM06EDl2kTyEqLRCdqR4AhYtRHQYfL3v.CjVEjI7eSA69c6 CrGuvgtJ_.R5_GybmBGqJ7RtGrzTX6uE4sKi_PdVf0fZJ4J7CdT7JkJtLVqB q1x5094XxBmcnIJ_mWbmJYY4eWmPeYjIMwFEDLkVXrnfjMAoSYveAvymIMcF MXe0Q72D9aIqKNcpbxEY7wbSMCExU6GVROlGlpmXVuAxeFAhHW77fv3oOkja oBx6uffUAdHpQ_uPJV_wPBTLT4j8KbFyW6nEPeeO2VgNP6lz3P8w.Xe2036g spXgoiDrd7evijGPmU1ovDYO8HeXXUyh.F5t.fWD1Px4GhjGbePySi3ONcE4 DiYC_zpTEd9u4ByEFC_cAOoCCjCOi9nxlN_YcjQ-- Received: from [71.196.53.15] by web162204.mail.bf1.yahoo.com via HTTP; Sun, 03 Nov 2013 08:57:08 PST X-Rocket-MIMEInfo: 002.001,TWFuaXNoLAoKVGhhbmtzIGZvciByZXBseS4KCgoxLsKgTG9hZCB0byBIZGZzLCBiZXdhcmUgb2YgU3Fvb3AgZXJyb3IgaGFuZGxpbmcsIGFzIGl0cyBhIG1hcHJlZHVjZSBiYXNlZCBmcmFtZXdvcmssIHNvIGlmIDEgbWFwcGVyIGZhaWxzIGl0IG1pZ2h0IGhhcHBlbiB0aGF0IHlvdSBnZXQgcGFydGlhbCBkYXRhLgpTbyBkbyB5b3Ugc2F5IHRoYXQgLSBpZiBJIGNhbiBoYW5kbGUgZXJyb3JzIGluIFNxb29wLCBnb2luZyBmb3IgMTAwIEhERlMgZm9sZGVycy9maWxlcyAtIGlzIGl0IE9LID8KCjIuIENyZWF0ZSABMAEBAQE- X-Mailer: YahooMailWebService/0.8.161.596 References: Message-ID: <1383497828.77709.YahooMailNeo@web162204.mail.bf1.yahoo.com> Date: Sun, 3 Nov 2013 08:57:08 -0800 (PST) From: Raj Hadoop Reply-To: Raj Hadoop Subject: Re: Oracle to HDFS through Sqoop and a Hive External Table To: "user@hive.apache.org" , "user@hadoop.apache.org" , Sqoop , "manish.hadoop.work" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="15747-442184063-1383497828=:77709" X-Virus-Checked: Checked by ClamAV on apache.org --15747-442184063-1383497828=:77709 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Manish,=0A=0AThanks for reply.=0A=0A=0A1.=A0Load to Hdfs, beware of Sqoop e= rror handling, as its a mapreduce based framework, so if 1 mapper fails it = might happen that you get partial data.=0ASo do you say that - if I can han= dle errors in Sqoop, going for 100 HDFS folders/files - is it OK ?=0A=0A2. = Create partition based on date and hour, if customer table has some date or= timestamp column.=0AI cannot rely on date or timestamp column. So can I go= with Customer ID ?=0A=0A3. Think about file format also, as that will affe= ct the load and query time.=0ACan you please suggest a file format that I h= ave to use ?=0A=0A4. Think about compression as well before hand, as that w= ill govern the data split, and performance of your queries as well.=0ADoes = compression increases or reduces performance ? Isn't the compression advant= age is saving in storage?=A0=0A=0A- Raj=0A=0A=0A=0AOn Sunday, November 3, 2= 013 11:03 AM, manish.hadoop.work wrote:=0A = =0A1.=A0Load to Hdfs, beware of Sqoop error handling, as its a mapreduce ba= sed framework, so if 1 mapper fails it might happen that you get partial da= ta.=0A=0A2. Create partition based on date and hour, if customer table has = some date or timestamp column.=0A=0A3. Think about file format also, as tha= t will affect the load and query time.=0A=0A4. Think about compression as w= ell before hand, as that will govern the data split, and performance of you= r queries as well.=0A=0ARegards,=0AManish=0A=0A=0A=0ASent from my T-Mobile = 4G LTE Device=0A=0A=0A-------- Original message --------=0AFrom: Raj Hadoop= =0ADate: 11/03/2013 7:39 AM (GMT-08:00) =0ATo: Hiv= e ,Sqoop ,User =0ASubject: Oracle to HDFS through Sqoop and a Hive External Tabl= e =0A=0A=0A=0AHi,=0A=0AI am sending this to the three dist-lists of Hadoop,= Hive and Sqoop as this question is closely related to all the three areas.= =0A=0AI have this requirement.=0A=0AI have a big table in Oracle (about 60 = million rows - Primary Key Customer Id). I want to bring this to HDFS and t= hen create=0Aa Hive external table.=A0My requirement is running queries on = this Hive table (at this time i do not know what queries i would be running= ).=0A=0AIs the following a good design for the above problem ? Any pros and= cons of this.=0A=0A=0A1) Load the table to HDFS using Sqoop into multiple = folders (divide Customer Id's into 100 segments).=0A2) Create Hive external= partition table based on the above 100 HDFS directories.=0A=0A=0AThanks,= =0ARaj --15747-442184063-1383497828=:77709 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
Manish,
Thanks for reply.

1. Load to Hdfs, beware of Sqoop error handling, a= s its a mapreduce based framework, so if 1 mapper fails it might happen tha= t you get partial data.
=09
=
=09So do you say that - if I can handl= e errors in Sqoop, going for 100 HDFS folders/files - is it OK ?=09=09=09
=09

2. Create partition based= on date and hour, if customer table has some date or timestamp column.
=09
=09I cannot rely on date or timestamp column. So can I go with Cu= stomer ID ?

3. Think about file format= also, as that will affect the load and query time.
=09
=09Can you= please suggest a file format that I have to use ?

4. Think about compression as well before hand, = as that will govern the data split, and performance of your queries as well= .
=09
=09= Does compression increases or reduces performance ? Isn't the compre= ssion advantage is saving in storage? 

- Raj
=09


On Sunda= y, November 3, 2013 11:03 AM, manish.hadoop.work <manish.hadoop.work@gma= il.com> wrote:
1. Load to Hdfs, beware of Sqoop error h= andling, as its a mapreduce based framework, so if 1 mapper fails it might = happen that you get partial data.

2. Create partit= ion based on date and hour, if customer table has some date or timestamp co= lumn.

3. Think about file format also, as that wil= l affect the load and query time.

4. Think about c= ompression as well before hand, as that will govern the data split, and performance o= f your queries as well.

Regards,
Manish<= /div>



Sent from my T-Mobile 4G LTE Device

=

-------- Original message --------
From: Raj Hadoop <hadoopra= j@yahoo.com>
Date: 11/03/2013 7:39 AM (GMT-08:00)
To: Hive <= ;user@hive.apache.org>,Sqoop <user@sqoop.apache.org>,User <user= @hadoop.apache.org>
Subject: Oracle to HDFS through Sqoop and a Hive= External Table


Hi,

= I am sending this to the three dist-lists of Hadoop, Hive and Sqoop as this= question is closely related to all the three areas.

I have this requirement.

= I have a big table in Oracle (about 60 million rows - Primary Key Customer = Id). I want to bring this to HDFS and then create
a Hive external table. My requirement is running queries on this Hive t= able (at this time i do not know what queries i would be running).

Is the followi= ng a good design for the above problem ? Any pros and cons of this.<= br>

1) Load the= table to HDFS using Sqoop into multiple folders (divide Customer Id's into= 100 segments).
= 2) Create Hive external parti= tion table based on the above 100 HDFS directories.


Thanks,
Raj
=


--15747-442184063-1383497828=:77709--