hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: Hive and Lzo Compression
Date Sat, 17 Aug 2013 14:40:26 GMT
As per my understanding,

Underlying hadoop framework identifies if the files are compressed or not
in a transparent manner. If they are compressed then the framework does
take care of the decompression part when the compression codecs are
available .


On Thu, Aug 15, 2013 at 4:20 AM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  I am not sure if in this cade data is loaded
> OR partition  added with location specified (to some location in HDFS)
>
>  Yes u r stating the question correctly
>
>  sanjay
>
>   From: Nitin Pawar <nitinpawar432@gmail.com>
> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
> Date: Wednesday, August 14, 2013 10:54 AM
>
> To: "user@hive.apache.org" <user@hive.apache.org>
> Subject: Re: Hive and Lzo Compression
>
>   Please correct me if I understood the question correctly
>
>  You created a table def without mentioning a stored as clause
> then you load data into table from a compressed a file
> then do a select query and it still works
> but how did it figured out which compression codec to use?
>
>  Am I stating it correctly ?
>
>
>
> On Wed, Aug 14, 2013 at 11:11 PM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:
>
>>  That is really interesting…let me try and think of a reason…meanwhile
>> any other LZO Hive Samurais out there ? Please help with some guidance
>>
>>  sanjay
>>
>>   From: w00t w00t <w00tel@yahoo.de>
>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t <
>> w00tel@yahoo.de>
>>  Date: Wednesday, August 14, 2013 1:15 AM
>>
>> To: "user@hive.apache.org" <user@hive.apache.org>
>> Subject: Re: Hive and Lzo Compression
>>
>>
>>  Thanks for your reply.
>>
>>  The interesting thing I experience is that the SELECT query still works
>> - even when I do not specify the STORED AS clause... that puzzles me a bit.
>>
>>   ------------------------------
>> *Von:* Sanjay Subramanian <Sanjay.Subramanian@wizecommerce.com>
>> *An:* "user@hive.apache.org" <user@hive.apache.org>; w00t w00t <
>> w00tel@yahoo.de>
>> *Gesendet:* 3:44 Mittwoch, 14.August 2013
>> *Betreff:* Re: Hive and Lzo Compression
>>
>>  Hi
>>
>>  I think the CREATE TABLE without the STORED AS clause will not give any
>> errors while creating the table.
>> However when you query that table and since that table contains .lzo
>> files , you would  get errors.
>> With external tables , u r separating the table creation(definition) from
>> the data. So only at the time of query of that table, hive might report
>> errors.
>>
>>  LZO compression rocks ! I am so glad I used it in our projects here.
>>
>>  Regards
>>
>>  sanjay
>>
>>   From: w00t w00t <w00tel@yahoo.de>
>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t <
>> w00tel@yahoo.de>
>> Date: Tuesday, August 13, 2013 12:13 AM
>> To: "user@hive.apache.org" <user@hive.apache.org>
>> Subject: Re: Hive and Lzo Compression
>>
>>   Thanks for your replies and the link.
>>
>>  I could get it working, but wondered why the CREATE TABLE statement
>> worked without the STORED AS Clause as well...that's what puzzles me a
>> bit...
>>
>>  But I will use the STORED AS Clause to be on the safe side.
>>
>>
>>   ------------------------------
>> *Von:* Lefty Leverenz <leftyleverenz@gmail.com>
>> *An:* user@hive.apache.org
>> *CC:* w00t w00t <w00tel@yahoo.de>
>> *Gesendet:* 19:06 Samstag, 10.August 2013
>> *Betreff:* Re: Hive and Lzo Compression
>>
>>  I'm not seeing any documentation link in Sanjay's message, so here it
>> is again (in the Hive wiki's language manual):
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.
>>
>>
>> On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian <
>> Sanjay.Subramanian@wizecommerce.com> wrote:
>>
>>  Please refer this documentation here
>> Let me know if u need more clarifications so that we can make this
>> document better and complete
>>
>>  Thanks
>>
>>  sanjay
>>
>>   From: w00t w00t <w00tel@yahoo.de>
>> Reply-To: "user@hive.apache.org" <user@hive.apache.org>, w00t w00t <
>> w00tel@yahoo.de>
>> Date: Thursday, August 8, 2013 2:02 AM
>> To: "user@hive.apache.org" <user@hive.apache.org>
>> Subject: Hive and Lzo Compression
>>
>>
>>    Hello,
>>
>> I am started to run Hive with Lzo compression on Hortonworks 1.2
>>
>> I have managed to install/configure Lzo and  hive -e "set
>> io.compression.codecs" shows me the Lzo Codecs:
>> io.compression.codecs=
>> org.apache.hadoop.io.compress.GzipCodec,
>> org.apache.hadoop.io.compress.DefaultCodec,
>> com.hadoop.compression.lzo.LzoCodec,
>> com.hadoop.compression.lzo.LzopCodec,
>> org.apache.hadoop.io.compress.BZip2Codec
>>
>> However, I have some questions where I would be happy if you could help
>> me.
>>
>> (1) CREATE TABLE statement
>>
>>  I read in different postings, that in the CREATE TABLE statement, I have
>> to use the following STORAGE clause:
>>
>>  CREATE EXTERNAL TABLE txt_table_lzo (
>>     txt_line STRING
>>  )
>>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
>>  STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
>> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>>  LOCATION '/user/myuser/data/in/lzo_compressed';
>>
>>  It works withouth any problems now to execute SELECT statements on this
>> table with Lzo data.
>>
>>  However I also created a table on the same data without this STORAGE
>> clause:
>>
>>  CREATE EXTERNAL TABLE txt_table_lzo_tst (
>>     txt_line STRING
>>  )
>>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '||||'
>>  LOCATION '/user/myuser/data/in/lzo_compressed';
>>
>>  The interesting thing is, it works as well, when I execute a SELECT
>> statement and this table.
>>
>>  Can you help, why the second CREATE TABLE statement works as well?
>>  What should I use in DDLs?
>>  Is it best practice to use the STORED AS clause with a
>> "deprecatedLzoTextInputFormat"? Or should I remove it?
>>
>>
>> (2) Output and Intermediate Compression Settings
>>
>>  I want to use output compression .
>>
>>  In "Programming Hive" from Capriolo, Wampler, Rutherglen the following
>> commands are recommended:
>>  SET hive.exec.compress.output=true;
>>  SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>>
>>           However, in some other places in forums, I found the following
>> recommended settings:
>>  SET hive.exec.compress.output=true
>>  SET mapreduce.output.fileoutputformat.compress=true
>>  SET
>> mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
>>
>>  Am I right, that the first settings are for Hadoop versions prior 0.23?
>>  Or is there any other reason why the settings are different?
>>
>>  I am using Hadoop 1.1.2 with Hive 0.10.0.
>>  Which settings would you recommend to use?
>>
>>  --------------
>>           I also want to compress intermediate results.
>>
>>          Again, in  "Programming Hive" the following settings are
>> recommended:
>>          SET hive.exec.compress.intermediate=true;
>>          SET
>> mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>>
>>           Is this the right setting?
>>
>>           Or should I again use the settings (which look more valid for
>> Hadoop 0.23 and greater)?:
>>           SET hive.exec.compress.intermediate=true;
>>           SET
>> mapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>>
>> Thanks
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>>
>>
>>
>>  -- Lefty
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>
>
>  --
> Nitin Pawar
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>



-- 
Nitin Pawar

Mime
View raw message