Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B994C10E77 for ; Mon, 19 Aug 2013 08:07:15 +0000 (UTC) Received: (qmail 21022 invoked by uid 500); 19 Aug 2013 08:07:13 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 20675 invoked by uid 500); 19 Aug 2013 08:07:12 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 20667 invoked by uid 99); 19 Aug 2013 08:07:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Aug 2013 08:07:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [77.238.189.71] (HELO nm18.bullet.mail.ird.yahoo.com) (77.238.189.71) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 19 Aug 2013 08:07:04 +0000 Received: from [77.238.189.237] by nm18.bullet.mail.ird.yahoo.com with NNFMP; 19 Aug 2013 08:06:21 -0000 Received: from [212.82.108.250] by tm18.bullet.mail.ird.yahoo.com with NNFMP; 19 Aug 2013 08:06:21 -0000 Received: from [127.0.0.1] by omp1015.mail.ird.yahoo.com with NNFMP; 19 Aug 2013 08:06:21 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 155423.38899.bm@omp1015.mail.ird.yahoo.com Received: (qmail 66692 invoked by uid 60001); 19 Aug 2013 08:06:21 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.de; s=s1024; t=1376899581; bh=Pz5avwLqf/qhCNN8GqXOTkwLFzvQMeNEDtyJbSfxQEI=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=ZqT26SwYH+slpdrqoaEZIPmjfXBvNI28WRM9plrH8FlT3QG4C7Qa1RTO2pDCMZU+XkMfGoPY6CEqGGcT8UT0pQsH+M/GRitZIFnKtrYMyzpqv1LtvwOkyq7oX5o+lanFpRXDF6zEtrcE2BaMP5AEvTsrSuxdI8a4B/UfDf6BXgg= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.de; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=sPbvnx15Y0dZAP7jgzmQUQYxCl/aBx2BRfL/hiOdNXcjkYrWb5CvXE+Uj9catFolop+E0kTXBlC9Lt46aLTn2RlY/QWPNggu19B1FUGQEFR5kUkXfNNTZfwXSYmpbxqwXc4eY6Lj2kO70XDOddtbXeKvyR0d7b7MRWI3xhCiXVM=; X-YMail-OSG: rZe8b18VM1nrIpeFohEtwFd4QkVMo1zdDpbhq0mB..d6rUY 3TBshImuv87kzM1OmFpzCsi1fDFg3lOP9Ox2VuyBY9oFm.NsInFkHFW3X09Z aA5w42wLXbvBFvy279shR8y4KwziyOr8vESXlbrA_Chqo0VPxi7zerqOMb18 oOTYQ5qcsWOVkXwVHraxlo0oNPKDHNXRdW9S37FbYlb1kMaIXbjaTmAx1CNK quoZdrLqmivLy299qu.VhsTwFnKpcTwHnkc1emDAsVHUYckno5nxpNVb2TUv fosuzpAb8_dsdKl.aJ3tWWtlie8_GFQPKnMiEemAZIrFdM9uJr9p8ahnu7_A v4MOIXDF2WUviHzOSAHvPevtLwv6G2XhC.vtVkitsn1ABzyj8YY5YEo00JKz t8HqAj9501Kx.MArGvMwdv2_49wjOMusJG0tLtYKDDdEfKZSLTNxzOIH2E0F zm1K_YfKGIeHml_E80M4kDaw54V1DgX79MH8ly5kOG4X1Vei24.ZO9PpaLwH X_XFIjlX50dCFjQPvWjsNavHHgjmPPfN.ukpV11qTO_P.3fmnQY5J9yLtHtV 5BfobsW3JD6hQVEH29Hw- Received: from [153.65.16.10] by web173202.mail.ir2.yahoo.com via HTTP; Mon, 19 Aug 2013 09:06:20 BST X-Rocket-MIMEInfo: 002.001,TXkgc2NlbmFyaW8gaXMgYSBiaXQgZGlmZmVyZW50IC0gSSBhbSB1c2luZyBleHRlcm5hbCB0YWJsZXMuCgpTbyBJIHVwbG9hZGVkIHNvbWUgbHpvIGNvbXByZXNzZWQgZmlsZXMgaW50byBIREZTLCBnZW5lcmF0ZWQgdGhlIGx6by1pbmRleCBmaWxlcyBhbmQgZmluYWxseSBJIGNyZWF0ZWQgdGhlIGV4dGVybmFsIHRhYmxlIHdpdGhvdXQgdGhlIHNwZWNpZmljIHN0b3JhZ2UgYXMgY2xhdXNlIC4KQSBTRUxFQ1Qgc3RhdGVtZW50IG9uIHRoZSB0YWJsZSBzdGlsbCB3b3Jrcy4KCkRvZXMgaXQgd29yayB0cmFuc3ABMAEBAQE- X-Mailer: YahooMailWebService/0.8.154.571 References: <1376468125.38055.YahooMailNeo@web173206.mail.ir2.yahoo.com> Message-ID: <1376899580.66100.YahooMailNeo@web173202.mail.ir2.yahoo.com> Date: Mon, 19 Aug 2013 09:06:20 +0100 (BST) From: w00t w00t Reply-To: w00t w00t Subject: Re: Hive and Lzo Compression To: "user@hive.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="979763960-45105014-1376899580=:66100" X-Virus-Checked: Checked by ClamAV on apache.org --979763960-45105014-1376899580=:66100 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable My scenario is a bit different - I am using external tables.=0A=0ASo I uplo= aded some lzo compressed files into HDFS, generated the lzo-index files and= finally I created the external table without the specific storage as claus= e .=0AA SELECT statement on the table still works.=0A=0ADoes it work transp= arently? So, Hadoop sees the lzo extension of my files and knows how to dec= ompress it?=0A=0A=0A=0A=0A=0A________________________________=0A Von: Nitin= Pawar =0AAn: "user@hive.apache.org" =0AGesendet: 19:54 Mittwoch, 14.August 2013=0ABetreff: Re: Hive a= nd Lzo Compression=0A =0A=0A=0APlease correct me if I understood the questi= on correctly=C2=A0=0AYou created a table def without mentioning a stored as= clause=C2=A0=0Athen you load data into table from a compressed a file=C2= =A0=0Athen do a select query and it still works=C2=A0=0Abut how did it figu= red out which compression codec to use?=C2=A0=0A=0AAm I stating it correctl= y ?=C2=A0=0A=0A=0A=0A=0AOn Wed, Aug 14, 2013 at 11:11 PM, Sanjay Subramania= n wrote:=0A=0AThat is really interest= ing=E2=80=A6let me try and think of a reason=E2=80=A6meanwhile any other LZ= O Hive Samurais out there ? Please help with some guidance=0A>=0A>=0A>sanja= y=C2=A0=0A>=0A>From: w00t w00t =0A>Reply-To: "user@hive.ap= ache.org" , w00t w00t =0A>Date: Wedn= esday, August 14, 2013 1:15 AM=0A>=0A>To: "user@hive.apache.org" =0A>Subject: Re: Hive and Lzo Compression=0A>=0A>=0A>=0A>=0A>= =0A>Thanks for your reply.=0A>=0A>=0A>The interesting thing I experience is= that the SELECT query still works - even when I do not specify the STORED = AS clause... that puzzles me a bit.=0A>=0A>=0A>=0A>=0A>____________________= ____________=0A> Von: Sanjay Subramanian =0A>An: "user@hive.apache.org" ; w00t w00t =0A>Gesendet: 3:44 Mittwoch, 14.August 2013=0A>Betreff: Re: Hive= and Lzo Compression=0A>=0A>=0A>=0A>Hi=C2=A0=0A>=0A>=0A>I think the CREATE = TABLE without the STORED AS clause will not give any errors while creating = the table.=0A>However when you query that table and since that table contai= ns .lzo files , you would =C2=A0get errors.=C2=A0=0A>With external tables ,= u r separating the table creation(definition) from the data. So only at th= e time of query of that table, hive might report errors.=0A>=0A>=0A>LZO com= pression rocks ! I am so glad I used it in our projects here.=0A>=0A>=0A>Re= gards=0A>=0A>=0A>sanjay=C2=A0=0A>=0A>From: w00t w00t =0A>R= eply-To: "user@hive.apache.org" , w00t w00t =0A>Date: Tuesday, August 13, 2013 12:13 AM=0A>To: "user@hive.apach= e.org" =0A>Subject: Re: Hive and Lzo Compression=0A>= =0A>=0A>=0A>Thanks for your replies and the link.=0A>=0A>=0A>I could get it= working, but wondered why the CREATE TABLE statement worked without the ST= ORED AS Clause as well...that's what puzzles me a bit...=0A>=0A>=0A>But I w= ill use the STORED AS Clause to be on the safe side.=0A>=0A>=0A>=0A>=0A>=0A= >=0A>________________________________=0A> Von: Lefty Leverenz =0A>An: user@hive.apache.org =0A>CC: w00t w00t =0A>Gesendet: 19:06 Samstag, 10.August 2013=0A>Betreff: Re: Hive and Lzo = Compression=0A>=0A>=0A>=0A>I'm not seeing any documentation link in Sanjay'= s message, so here it is again (in the Hive wiki's language manual): =C2=A0= https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO. =0A>= =0A>=0A>=0A>On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian wrote:=0A>=0A>Please refer this documentation her= e=0A>>Let me know if u need more clarifications so that we can make this do= cument better and complete=0A>>=0A>>=0A>>Thanks=0A>>=0A>>=0A>>sanjay=0A>>= =0A>>From: w00t w00t =0A>>Reply-To: "user@hive.apache.org"= , w00t w00t =0A>>Date: Thursday, Au= gust 8, 2013 2:02 AM=0A>>To: "user@hive.apache.org" = =0A>>Subject: Hive and Lzo Compression=0A>>=0A>>=0A>>=0A>>=0A>>=0A>>Hello,= =0A>>=C2=A0=0A>>I am started to run Hive with Lzo compression on Hortonwork= s 1.2=0A>>=C2=A0=0A>>I have managed to install/configure Lzo and=C2=A0 hive= -e "set io.compression.codecs" shows me the Lzo Codecs:=0A>>io.compression= .codecs=3D=0A>>org.apache.hadoop.io.compress.GzipCodec,=0A>>org.apache.hado= op.io.compress.DefaultCodec,=0A>>com.hadoop.compression.lzo.LzoCodec,=0A>>c= om.hadoop.compression.lzo.LzopCodec,=0A>>org.apache.hadoop.io.compress.BZip= 2Codec=0A>>=C2=A0=0A>>However, I have some questions where I would be happy= if you could help me.=0A>>(1) CREATE TABLE statement=0A>>=0A>>=0A>>I read = in different postings, that in the CREATE TABLE statement, I have to use th= e following STORAGE clause:=0A>>=C2=A0=0A>>CREATE EXTERNAL TABLE txt_table_= lzo (=0A>>=C2=A0=C2=A0 txt_line STRING=0A>>)=0A>>ROW FORMAT DELIMITED FIELD= S TERMINATED BY '||||'=0A>>STORED AS INPUTFORMAT 'com.hadoop.mapred.Depreca= tedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnor= eKeyTextOutputFormat'=0A>>LOCATION '/user/myuser/data/in/lzo_compressed';= =0A>>=C2=A0=0A>>It works withouth any problems now to execute SELECT statem= ents on this table with Lzo data.=0A>>=C2=A0=0A>>However I also created a t= able on the same data without this STORAGE clause:=0A>>=C2=A0=0A>>CREATE EX= TERNAL TABLE txt_table_lzo_tst (=0A>>=C2=A0=C2=A0 txt_line STRING=0A>>)=0A>= >ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'=0A>>LOCATION '/user/myuse= r/data/in/lzo_compressed';=0A>>=C2=A0=0A>>The interesting thing is, it work= s as well, when I execute a SELECT statement and this table.=0A>>=C2=A0=0A>= >Can you help, why the second CREATE TABLE statement works as well?=0A>>Wha= t should I use in DDLs? =0A>>Is it best practice to use the STORED AS claus= e with a "deprecatedLzoTextInputFormat"? Or should I remove it?=0A>>=C2=A0= =0A>>=C2=A0(2) Output and Intermediate Compression Settings =0A>>=C2=A0=0A>= >I want to use output compression .=0A>>=C2=A0=0A>>In "Programming Hive" fr= om Capriolo, Wampler, Rutherglen the following commands are recommended:=0A= >>SET hive.exec.compress.output=3Dtrue;=0A>>SET mapred.output.compression.c= odec=3Dcom.hadoop.compression.lzo.LzopCodec;=0A>>=C2=A0=0A>>=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 However, in some other places in f= orums, I found the following recommended settings:=0A>>SET hive.exec.compre= ss.output=3Dtrue=0A>>SET mapreduce.output.fileoutputformat.compress=3Dtrue= =0A>>SET mapreduce.output.fileoutputformat.compress.codec=3Dcom.hadoop.comp= ression.lzo.LzopCodec=0A>>=C2=A0=0A>>Am I right, that the first settings ar= e for Hadoop versions prior 0.23?=0A>>Or is there any other reason why the = settings are different?=0A>>=C2=A0=0A>>I am using Hadoop 1.1.2 with Hive 0.= 10.0.=0A>>Which settings would you recommend to use?=0A>>=C2=A0=0A>>-------= -------=0A>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 I also = want to compress intermediate results.=0A>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 =0A>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = Again, in=C2=A0 "Programming Hive" the following settings are recommended:= =0A>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 SET hive.exec.compre= ss.intermediate=3Dtrue;=0A>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 SET mapred.map.output.compression.codec=3Dcom.hadoop.compression.lzo.L= zopCodec;=0A>>=C2=A0=0A>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 Is this the right setting?=0A>>=0A>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 Or should I again use the settings (which look mor= e valid for Hadoop 0.23 and greater)?:=0A>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 SET hive.exec.compress.intermediate=3Dtrue;=0A>>= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 SET mapreduce.map.o= utput.compression.codec=3Dcom.hadoop.compression.lzo.LzopCodec;=0A>>=C2=A0= =0A>>Thanks=0A>>=C2=A0=0A>>=0A>>=0A>>=0A>>CONFIDENTIALITY NOTICE=0A>>=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A>>This email= message and any attachments are for the exclusive use of the intended reci= pient(s) and may contain confidential and privileged information. Any unaut= horized review, use, disclosure or distribution is prohibited. If you are n= ot the intended recipient,=0A please contact the sender by reply email and = destroy all copies of the original message along with any attachments, from= your computer system. If you are the intended recipient, please be advised= that the content of this message is subject to access, review=0A and discl= osure by the sender's Email System Administrator.=0A>>=0A>=0A>=0A>=0A>=0A--= =C2=A0Lefty =0A>=0A>=0A>=0A>CONFIDENTIALITY NOTICE=0A>=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A>This email message and any= attachments are for the exclusive use of the intended recipient(s) and may= contain confidential and privileged information. Any unauthorized review, = use, disclosure or distribution is prohibited. If you are not the intended = recipient,=0A please contact the sender by reply email and destroy all copi= es of the original message along with any attachments, from your computer s= ystem. If you are the intended recipient, please be advised that the conten= t of this message is subject to access, review=0A and disclosure by the sen= der's Email System Administrator.=0A>=0A>=0A>=0A>=0A>=0A>CONFIDENTIALITY NO= TICE=0A>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =0A>This email message and any attachments are for the exclusive use of the= intended recipient(s) and may contain confidential and privileged informat= ion. Any unauthorized review, use, disclosure or distribution is prohibited= . If you are not the intended recipient,=0A please contact the sender by re= ply email and destroy all copies of the original message along with any att= achments, from your computer system. If you are the intended recipient, ple= ase be advised that the content of this message is subject to access, revie= w=0A and disclosure by the sender's Email System Administrator.=0A>=0A=0A= =0A-- =0ANitin Pawar --979763960-45105014-1376899580=:66100 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
My scenario is a bit = different - I am using external tables.

So I uploaded some lzo compr= essed files into HDFS, generated the lzo-index files and finally I created = the external table without the specific storage as clause .
A SELECT sta= tement on the table still works.

Does it work transparently? So, Had= oop sees the lzo extension of my files and knows how to decompress it?
<= br>



Von: Nitin Pawar <nitinpawar432@gm= ail.com>
An: "user@hive.apache.org" <user@hive.apache.org>
Gesendet: 19:54 Mittwoch, 14.August 2013 Betreff: Re: Hive and Lz= o Compression

Please correct me if I understood = the question correctly 

You created a table def wi= thout mentioning a stored as clause 
then you load data into table= from a compressed a file 
=0A
then do a select query and it = still works 
but how did it figured out which compression co= dec to use? 

Am I stating it correctly ? = ;

=0A
=
On Wed, Aug 14, 2013 at 11:11 P= M, Sanjay Subramanian <Sanjay.Subramanian@wizecommerce.com<= /a>> wrote:
=0A
= =0A=0A=0A=0A
=0A
That is really interesting=E2=80=A6let me try = and think of a reason=E2=80=A6meanwhile any other LZO Hive Samurais out the= re ? Please help with some guidance
=0A

=0A
=0A
sanja= y 
=0A

=0A
=0A=0A
=0ADate: Wednesday, August 14, 2013 1:15 AM

=0ATo: "user@hive.apache.org" <user@hive.apache.org>
=0A=0ASubject: Re: Hive and Lzo Compression
= =0A
=0A

=0A=0A
=0A
=0A
=0A

=0A
=0A
=0AThanks for your rep= ly.
=0A
= =0A
=0A
=0A
=0AThe interesting thing I experience is that the SELECT = query still works - even when I do not specify the STORED AS clause... that= puzzles me a bit.
=0A
=0A

=0A
=0A
=0A=0A
=0A
=0AVon: Sanjay Subramanian <Sanjay.Subr= amanian@wizecommerce.com>
=0AAn: "user@hive.apache.o= rg" <user@hive.apache.org>; w00t w00t <w00tel@yahoo.de>= =0A
=0AGesendet: 3:44 Mi= ttwoch, 14.August 2013
=0ABetreff:<= /span> Re: Hive and Lzo Compression
=0A
=0A

=0A<= div>=0A
=0A
Hi 
=0A

=0A
=0A
I think the = CREATE TABLE without the STORED AS clause will not give any errors while cr= eating the table.
=0A
However when you query that table and since = that table contains .lzo files , you would  get errors. 
=0A=
With external tables , u r separating the table creation(definition) f= rom the data. So only at the time of query of that table, hive might report= errors.
=0A

=0A
=0A
LZO compression rocks ! I am so = glad I used it in our projects here.
=0A

=0A
=0A
Rega= rds
=0A

=0A
=0A
sanjay 
=0A

=0A=0A=0A
=0A=0AFrom:= w00t w00t <w00tel@yahoo.de>=
=0AReply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t <w00tel@yahoo.de>
=0A=0ADate: Tuesday, August 13, 2013 12:13 AM
=0ATo: "= user@hive.apache.org" <user@= hive.apache.org>
=0A=0ASubject:= Re: Hive and Lzo Compression
=0A
=0A

=0A
=0A=0A
=0A
=0A
Thanks for your replies and the link.
=0A
=0A=0A
=0A
=0AI could get it working, but wondered why the CREATE TABLE st= atement worked without the STORED AS Clause as well...that's what puzzles m= e a bit...
=0A
=0A
=0A
=0A
=0ABut I will use the STORED AS Clause to be on the= safe side.
=0A
=0A
=0A
=0A
=0A
=0A
=0A

=0A
=0A=0A
=0A
=0A
=0AVon: Lefty Leverenz <= leftyleverenz@gmail.com&g= t;
=0AAn: =0Auser@hive.apache.org
=0ACC: w00t w00t <w00tel@yahoo.de>=0A
=0AGesendet: 19:06 Samstag, 10.August 2013
=0ABetreff: Re: Hive and Lzo Compression
= =0A
=0A

=0A
=0A
I'm not seeing any = documentation link in Sanjay's message, so here it is again (in the Hive wi= ki's language manual):  https:= //cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.=0A
<= br>=0A
=0A
On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian=0A<Sanjay.Subramanian@wizecommerce.com> wrote:
= =0A
=0A
=0A
Please refer this documentation here
= =0A
Let me know if u need more clarifications so that we can make this = document better and complete
=0A

=0A
=0A
Thanks
= =0A

=0A
=0A
sanjay
=0A

=0A
=0A=0A<= div style=3D"border-right:medium none;padding-right:0in;padding-left:0in;pa= dding-top:3pt;text-align:left;font-size:11pt;border-bottom:medium none;font= -family:Calibri;border-top:#b5c4df 1pt solid;padding-bottom:0in;border-left= :medium none;">=0A=0AFrom: w00t w0= 0t <w00tel@yahoo.de>
=0AReply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t <w00tel@yahoo.de>
=0A=0ADate:= Thursday, August 8, 2013 2:02 AM
=0ATo: "user@hive.apache.= org" <user@hive.apache.org>
=0A=0ASubject: Hive and = Lzo Compression
=0A
=0A
=0A
=0A

=0A
=0A
= =0A
=0A

=0A
=0A
=0A
=0A
=0A
=0A=0A
Hel= lo,
=0A
 
=0A
I am started to run Hive with Lzo compression on Hortonworks 1.2
= =0A
 =0A
I ha= ve managed to install/configure Lzo and =0Ahive -e "set i= o.compression.codecs" shows me the Lzo Codecs:
=0A
io.compression.codecs=3D
= =0A
org.apac= he.hadoop.io.compress.GzipCodec,
=0A
org.apache.hadoop.io.compress.DefaultCodec,=0A
com.= hadoop.compression.lzo.LzoCodec,
=0A
com.hadoop.compression.lzo.LzopCodec,
= =0A
org.apac= he.hadoop.io.compress.BZip2Codec
=0A
 
=0A
However, I have some questions where I wo= uld be happy if you could help me.
=0A
=0A(1) CREATE TABLE statement

=0A
=0A=
=0AI read in different postings, that in the CREATE TABLE statement= , I have to use the following STORAGE clause:
=0A
=0A =0A
=0ACREATE EXTERNAL TABLE txt_table_lzo (
=0A
=0A   txt_line STRING
=0A
=0A)
=0A
=0AROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
=0A
=0AST= ORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUT= FORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
= =0A
=0ALOCATION '/user/myuser/data/in/lzo_compressed';
=0A
=0A 
=0A
=0AIt works withouth any problems now to execu= te SELECT statements on this table with Lzo data.
=0A
=0A =
=0A
=0AHowever I also created a table on the same data without= this STORAGE clause:
=0A
=0A 
=0A
=0ACREATE E= XTERNAL TABLE txt_table_lzo_tst (
=0A
=0A   txt_line STRING
=0A
=0A)
=0A
=0AROW FORMAT DELI= MITED FIELDS TERMINATED BY '||||'
=0A
=0ALOCATION '/user/myuser= /data/in/lzo_compressed';
=0A
=0A 
=0A
=0AThe = interesting thing is, it works as well, when I execute a SELECT statement a= nd this table.
=0A
=0A 
=0A
=0ACan you help, w= hy the second CREATE TABLE statement works as well?
=0A
=0AWhat= should I use in DDLs?
=0A
=0AIs it best practice to use the S= TORED AS clause with a "deprecatedLzoTextInputFormat"? Or should I remove i= t?
=0A
=0A 
=0A
 
=0A(2) Output and Intermediate Compression Settings= =0A
=0A 
=0A
=0AI want to use output compression .<= /div>=0A
=0A 
=0A
=0AIn "Programming Hive" from Cap= riolo, Wampler, Rutherglen the following commands are recommended:
=0A=
=0ASET hive.exec.compress.output=3Dtrue;
=0A
=0ASET map= red.output.compression.codec=3Dcom.hadoop.compression.lzo.LzopCodec;
= =0A
=0A 
=0A
        &nb= sp; =0AHowever, in some other places in forums, I found the followin= g recommended settings:
=0A
=0ASET hive.exec.compress.output=3D= true
=0A
=0ASET mapreduce.output.fileoutputformat.compress=3Dtr= ue
=0A
=0ASET mapreduce.output.fileoutputformat.compress.codec= =3Dcom.hadoop.compression.lzo.LzopCodec
=0A
=0A 
=0A<= div style=3D"margin:0in;margin-left:.375in;font-family:Calibri;font-size:11= .0pt;">=0AAm I right, that the first settings are for Hadoop versions prior= 0.23?
=0A
=0AOr is there any other reason why the settings are= different?
=0A
=0A 
=0A
=0AI am using Hadoop = 1.1.2 with Hive 0.10.0.
=0A
=0AWhich settings would you recomme= nd to use?
=0A
=0A 
=0A
=0A--------------=0A
&= nbsp;         =0AI also want= to compress intermediate results.
=0A
      &= nbsp;  =0A
=0A
         = =0AAgain, in  "Programming Hive" the following set= tings are recommended:
=0A
         = =0ASET hive.exec.compress.intermediate=3Dtrue;
=0A
   =       =0ASET mapred.map.output.compression.= codec=3Dcom.hadoop.compression.lzo.LzopCodec;
=0A
 
=0A
   &nbs= p;      =0AIs this the right setting?
= =0A

=0A<= span>          Or shoul= d I again use the settings (which look more valid for Hadoop 0.23 and great= er)?:
=0A          SET hive.exec.compress.intermediate=3Dtrue;
=0A
    = ;      =0ASET mapreduce.map.output.compress= ion.codec=3Dcom.hadoop.compression.lzo.LzopCodec;
=0A
 
=0A
Thanks
=0A
 
=0A=0A
=0A
=0A
=0A
=0A
=0A
=0A
=0A
=0A=0A
=0A
=0A
=0A
=0A
CONFIDENTIALITY NOTICE
=0A=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=0AThis email message and any= attachments are for the exclusive use of the intended recipient(s) and may= contain confidential and privileged information. Any unauthorized review, = use, disclosure or distribution is prohibited. If you are not the intended = recipient,=0A please contact the sender by reply email and destroy all copi= es of the original message along with any attachments, from your computer s= ystem. If you are the intended recipient, please be advised that the conten= t of this message is subject to access, review=0A and disclosure by the sen= der's Email System Administrator.
=0A
=0A
=0A
=0A<= /div>=0A
=0A
=0A

=0A
=0A-- Lefty =0A
=0A
=0A
=0A
=0A
=0A
=0A
=0A
=0A=0A
=0A
=0A
CONFIDENTIALITY NOTICE
=0A=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D
=0AThis email message and any attachments are f= or the exclusive use of the intended recipient(s) and may contain confident= ial and privileged information. Any unauthorized review, use, disclosure or= distribution is prohibited. If you are not the intended recipient,=0A plea= se contact the sender by reply email and destroy all copies of the original= message along with any attachments, from your computer system. If you are = the intended recipient, please be advised that the content of this message = is subject to access, review=0A and disclosure by the sender's Email System= Administrator.
=0A
=0A
=0A
=0A
=0A
=0A
=0A=0A
=0A
=0A
=0A
=0A
=0A=0A
<= br>

--
Nitin Pawar
=0A

--979763960-45105014-1376899580=:66100--