Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
  b=TJ1X9hKQy/vLf5gDJ7C6El37NKKxTi4tJoOC9BK53Z76XqWc33mdo2EWFG1D8emaZq85CaFFSKJBTOcPWQ+Uo68StoV38gM5KeueGbAezWOwutc8z1lRNL8qInoaEtLpg7oSSrwk6BAmwzXND3/UFFJTbochHR8LV3lwzZQUWmA=;
References: <nnhlcev1d0mh7arw233fbuyi.1383494581028@email.android.com>
Message-ID: <1383497828.77709.YahooMailNeo@web162204.mail.bf1.yahoo.com>
Date: Sun, 3 Nov 2013 08:57:08 -0800 (PST)
From: Raj Hadoop <hadoopraj@yahoo.com>
Reply-To: Raj Hadoop <hadoopraj@yahoo.com>
Subject: Re: Oracle to HDFS through Sqoop and a Hive External Table
To: "user@hive.apache.org" <user@hive.apache.org>,
  "user@hadoop.apache.org" <user@hadoop.apache.org>,
  Sqoop <user@sqoop.apache.org>,
  "manish.hadoop.work" <manish.hadoop.work@gmail.com>
In-Reply-To: <nnhlcev1d0mh7arw233fbuyi.1383494581028@email.android.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="15747-442184063-1383497828=:77709"

--15747-442184063-1383497828=:77709
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Manish,=0A=0AThanks for reply.=0A=0A=0A1.=A0Load to Hdfs, beware of Sqoop e=
rror handling, as its a mapreduce based framework, so if 1 mapper fails it =
might happen that you get partial data.=0ASo do you say that - if I can han=
dle errors in Sqoop, going for 100 HDFS folders/files - is it OK ?=0A=0A2. =
Create partition based on date and hour, if customer table has some date or=
 timestamp column.=0AI cannot rely on date or timestamp column. So can I go=
 with Customer ID ?=0A=0A3. Think about file format also, as that will affe=
ct the load and query time.=0ACan you please suggest a file format that I h=
ave to use ?=0A=0A4. Think about compression as well before hand, as that w=
ill govern the data split, and performance of your queries as well.=0ADoes =
compression increases or reduces performance ? Isn't the compression advant=
age is saving in storage?=A0=0A=0A- Raj=0A=0A=0A=0AOn Sunday, November 3, 2=
013 11:03 AM, manish.hadoop.work <manish.hadoop.work@gmail.com> wrote:=0A =
=0A1.=A0Load to Hdfs, beware of Sqoop error handling, as its a mapreduce ba=
sed framework, so if 1 mapper fails it might happen that you get partial da=
ta.=0A=0A2. Create partition based on date and hour, if customer table has =
some date or timestamp column.=0A=0A3. Think about file format also, as tha=
t will affect the load and query time.=0A=0A4. Think about compression as w=
ell before hand, as that will govern the data split, and performance of you=
r queries as well.=0A=0ARegards,=0AManish=0A=0A=0A=0ASent from my T-Mobile =
4G LTE Device=0A=0A=0A-------- Original message --------=0AFrom: Raj Hadoop=
 <hadoopraj@yahoo.com> =0ADate: 11/03/2013  7:39 AM  (GMT-08:00) =0ATo: Hiv=
e <user@hive.apache.org>,Sqoop <user@sqoop.apache.org>,User <user@hadoop.ap=
ache.org> =0ASubject: Oracle to HDFS through Sqoop and a Hive External Tabl=
e =0A=0A=0A=0AHi,=0A=0AI am sending this to the three dist-lists of Hadoop,=
 Hive and Sqoop as this question is closely related to all the three areas.=
=0A=0AI have this requirement.=0A=0AI have a big table in Oracle (about 60 =
million rows - Primary Key Customer Id). I want to bring this to HDFS and t=
hen create=0Aa Hive external table.=A0My requirement is running queries on =
this Hive table (at this time i do not know what queries i would be running=
).=0A=0AIs the following a good design for the above problem ? Any pros and=
 cons of this.=0A=0A=0A1) Load the table to HDFS using Sqoop into multiple =
folders (divide Customer Id's into 100 segments).=0A2) Create Hive external=
 partition table based on the above 100 HDFS directories.=0A=0A=0AThanks,=
=0ARaj
--15747-442184063-1383497828=:77709
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"color:#000; background-color:#fff; font-family:He=
lveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;fo=
nt-size:12pt"><div><span></span></div><div style=3D"font-family: 'Helvetica=
 Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande', sans-serif; font-siz=
e: 13px; ">Manish,</div><div style=3D"font-family: 'Helvetica Neue', 'Segoe=
 UI', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px; "><br=
></div><div style=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, =
Arial, 'Lucida Grande', sans-serif; font-size: 13px; ">Thanks for reply.</d=
iv><div style=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Aria=
l, 'Lucida Grande', sans-serif; font-size: 13px; "><br></div><div style=3D"=
font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande=
', sans-serif; font-size: 13px; "><br></div><div style=3D"font-family: 'Hel=
vetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande', sans-serif;
 font-size: 13px; ">1.&nbsp;Load to Hdfs, beware of Sqoop error handling, a=
s its a mapreduce based framework, so if 1 mapper fails it might happen tha=
t you get partial data.</div><div style=3D"font-family: 'Helvetica Neue', '=
Segoe UI', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px; =
"><span class=3D"Apple-tab-span" style=3D"white-space:pre">=09</span></div>=
<div style=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, =
'Lucida Grande', sans-serif; font-size: 13px; "><span class=3D"Apple-tab-sp=
an" style=3D"white-space:pre">=09</span>So do you say that - if I can handl=
e errors in Sqoop, going for 100 HDFS folders/files - is it OK ?<span class=
=3D"Apple-tab-span" style=3D"white-space:pre">=09=09=09</span></div><div st=
yle=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida=
 Grande', sans-serif; font-size: 13px; "><span class=3D"Apple-tab-span" sty=
le=3D"white-space:pre">=09</span></div><div style=3D"font-family: 'Helvetic=
a Neue', 'Segoe UI',
 Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px; "><br></di=
v><div style=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial=
, 'Lucida Grande', sans-serif; font-size: 13px; ">2. Create partition based=
 on date and hour, if customer table has some date or timestamp column.</di=
v><div style=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial=
, 'Lucida Grande', sans-serif; font-size: 13px; "><span class=3D"Apple-tab-=
span" style=3D"white-space:pre">=09</span></div><div style=3D"font-family: =
'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande', sans-serif=
; font-size: 13px; "><span class=3D"Apple-tab-span" style=3D"white-space:pr=
e">=09</span>I cannot rely on date or timestamp column. So can I go with Cu=
stomer ID ?</div><div style=3D"font-family: 'Helvetica Neue', 'Segoe UI', H=
elvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px; "><br clear=
=3D"none"></div><div style=3D"font-family: 'Helvetica Neue', 'Segoe UI', He=
lvetica, Arial,
 'Lucida Grande', sans-serif; font-size: 13px; ">3. Think about file format=
 also, as that will affect the load and query time.</div><div style=3D"font=
-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande', s=
ans-serif; font-size: 13px; "><span class=3D"Apple-tab-span" style=3D"white=
-space:pre">=09</span></div><div style=3D"font-family: 'Helvetica Neue', 'S=
egoe UI', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px; "=
><span class=3D"Apple-tab-span" style=3D"white-space:pre">=09</span>Can you=
 please suggest a file format that I have to use ?</div><div style=3D"font-=
family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande', sa=
ns-serif; font-size: 13px; "><br clear=3D"none"></div><div style=3D"font-fa=
mily: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande', sans=
-serif; font-size: 13px; ">4. Think about compression as well before hand, =
as that will govern the data split, and performance of your queries as well=
.</div><div
 style=3D"font-family: 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Luc=
ida Grande', sans-serif; font-size: 13px; "><span class=3D"Apple-tab-span" =
style=3D"white-space:pre">=09</span></div><div style=3D"font-family: 'Helve=
tica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande', sans-serif; font=
-size: 13px; "><span class=3D"Apple-tab-span" style=3D"white-space:pre">=09=
</span>Does compression increases or reduces performance ? Isn't the compre=
ssion advantage is saving in storage?&nbsp;</div><div style=3D"font-family:=
 'Helvetica Neue', 'Segoe UI', Helvetica, Arial, 'Lucida Grande', sans-seri=
f; font-size: 13px; "><br></div><div style=3D"font-family: 'Helvetica Neue'=
, 'Segoe UI', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13p=
x; ">- Raj</div><div style=3D"font-family: 'Helvetica Neue', 'Segoe UI', He=
lvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px; "><span class=
=3D"Apple-tab-span" style=3D"white-space:pre">=09</span></div><div></div><d=
iv
 class=3D"yahoo_quoted" style=3D"display: block; "> <br> <br> <div style=3D=
"font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Gr=
ande', sans-serif; font-size: 12pt; "> <div style=3D"font-family: Helvetica=
Neue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font=
-size: 12pt; "> <div dir=3D"ltr"> <font size=3D"2" face=3D"Arial"> On Sunda=
y, November 3, 2013 11:03 AM, manish.hadoop.work &lt;manish.hadoop.work@gma=
il.com&gt; wrote:<br> </font> </div>  <div class=3D"y_msg_container"><div i=
d=3D"yiv9830488558"><div><div>1.&nbsp;Load to Hdfs, beware of Sqoop error h=
andling, as its a mapreduce based framework, so if 1 mapper fails it might =
happen that you get partial data.</div><div><br></div><div>2. Create partit=
ion based on date and hour, if customer table has some date or timestamp co=
lumn.</div><div><br></div><div>3. Think about file format also, as that wil=
l affect the load and query time.</div><div><br></div><div>4. Think about c=
ompression
 as well before hand, as that will govern the data split, and performance o=
f your queries as well.</div><div><br></div><div>Regards,</div><div>Manish<=
/div><div><br></div><div><br></div><div><br></div><div><div style=3D"font-s=
ize:75%;color:#575757;">Sent from my T-Mobile 4G LTE Device</div></div><br>=
<br><br>-------- Original message --------<br>From: Raj Hadoop &lt;hadoopra=
j@yahoo.com&gt; <br>Date: 11/03/2013  7:39 AM  (GMT-08:00) <br>To: Hive &lt=
;user@hive.apache.org&gt;,Sqoop &lt;user@sqoop.apache.org&gt;,User &lt;user=
@hadoop.apache.org&gt; <br>Subject: Oracle to HDFS through Sqoop and a Hive=
 External Table <br> <br><br><div style=3D"color: rgb(0, 0, 0); background-=
color: rgb(255, 255, 255); font-family: HelveticaNeue, 'Helvetica Neue', He=
lvetica, Arial, 'Lucida Grande', sans-serif; font-size: 12pt; "><div>Hi,</d=
iv><div><br></div><div style=3D"color: rgb(0, 0, 0); font-size: 16px; font-=
family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida
 Grande', sans-serif; background-color: transparent; font-style: normal; ">=
I am sending this to the three dist-lists of Hadoop, Hive and Sqoop as this=
 question is closely related to all the three areas.</div><div style=3D"col=
or: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue, 'Helvetica N=
eue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: tran=
sparent; font-style: normal; "><br></div><div style=3D"color: rgb(0, 0, 0);=
 font-size: 16px; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, =
Arial, 'Lucida Grande', sans-serif; background-color: transparent; font-sty=
le: normal; ">I have this requirement.</div><div style=3D"color: rgb(0, 0, =
0); font-size: 16px; font-family: HelveticaNeue, 'Helvetica Neue', Helvetic=
a, Arial, 'Lucida Grande', sans-serif; background-color: transparent; font-=
style: normal; "><br></div><div style=3D"color: rgb(0, 0, 0); font-size: 16=
px; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida
 Grande', sans-serif; background-color: transparent; font-style: normal; ">=
I have a big table in Oracle (about 60 million rows - Primary Key Customer =
Id). I want to bring this to HDFS and then create</div><div style=3D"color:=
 rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue, 'Helvetica Neue=
', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: transpa=
rent; font-style: normal; ">a Hive external table.&nbsp;<span style=3D"back=
ground-color:transparent;">My requirement is running queries on this Hive t=
able (at this time i do not know what queries i would be running).</span></=
div><div style=3D"color: rgb(0, 0, 0); font-size: 16px; font-family: Helvet=
icaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; b=
ackground-color: transparent; font-style: normal; "><span style=3D"backgrou=
nd-color:transparent;"><br></span></div><div style=3D"color: rgb(0, 0, 0); =
font-size: 16px; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica,
 Arial, 'Lucida Grande', sans-serif; background-color: transparent; font-st=
yle: normal; "><span style=3D"background-color:transparent;">Is the followi=
ng a good design for the above problem ? Any pros and cons of this.</span><=
br></div><div style=3D"color: rgb(0, 0, 0); font-size: 16px; font-family: H=
elveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-ser=
if; background-color: transparent; font-style: normal; "><span style=3D"bac=
kground-color:transparent;"><br></span></div><div style=3D"color: rgb(0, 0,=
 0); font-size: 16px; font-family: HelveticaNeue, 'Helvetica Neue', Helveti=
ca, Arial, 'Lucida Grande', sans-serif; background-color: transparent; font=
-style: normal; "><span style=3D"background-color:transparent;">1) Load the=
 table to HDFS using Sqoop into multiple folders (divide Customer Id's into=
 100 segments).</span></div><div style=3D"color: rgb(0, 0, 0); font-size: 1=
6px; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucid=
a
 Grande', sans-serif; background-color: transparent; font-style: normal; ">=
<span style=3D"background-color:transparent;">2) Create Hive external parti=
tion table based on the above 100 HDFS directories.</span></div><div style=
=3D"color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue, 'Helv=
etica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-colo=
r: transparent; font-style: normal; "><br></div><div style=3D"color: rgb(0,=
 0, 0); font-size: 16px; font-family: HelveticaNeue, 'Helvetica Neue', Helv=
etica, Arial, 'Lucida Grande', sans-serif; background-color: transparent; f=
ont-style: normal; "><br></div><div style=3D"color: rgb(0, 0, 0); font-size=
: 16px; font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lu=
cida Grande', sans-serif; background-color: transparent; font-style: normal=
; ">Thanks,</div><div style=3D"color: rgb(0, 0, 0); font-size: 16px; font-f=
amily: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande',
 sans-serif; background-color: transparent; font-style: normal; ">Raj</div>=
</div></div></div><br><br></div>  </div> </div>  </div> </div></body></html=
>
--15747-442184063-1383497828=:77709--