hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy...@yahoo.com>
Subject Re: Compressed data storage in HDFS - Error
Date Wed, 06 Jun 2012 09:45:47 GMT
Hi Sreenath

The lzo error is because you don't have the lzo libraries in Hadoop_Home/lib/native folder.
You need to pack/build lzo for the OS you are using.

In compression as you mentioned there is an overhead in decompressing while processing the
records. HDFS is used to store large amount of data so compression saves much on storage space
(consider replication as well). Now it is not final output compression that speeds up map
reduce jobs but it the intermediate compression that has this advantage. Intermediate compression
means compression of map output. In a map reduce job there is much of copy and shuffle happening
between the map and reduce phases, when this intermediate data is compressed this operation
is faster as it consumes much lesser IO. 


The following properties enables intermediate compression
mapred.compress.map.output=true
mapred.map.output.compression.codec= hadoop.compression.lzo.LzoCodec


Regards
Bejoy KS



________________________________
 From: Siddharth Tiwari <siddharth.tiwari@live.com>
To: "user@hive.apache.org " <user@hive.apache.org> 
Sent: Wednesday, June 6, 2012 2:58 PM
Subject: RE: Compressed data storage in HDFS - Error
 

There is something you gain and something you loose.
Compression would reduce IO through increased cpu work . Also you would receive different
experience for different tasks ie HDFS read , HDFS write , shuffle and sort . So to go for
compression or not depends on your usages .
Sent from my N8




-----Original Message----- 
From: Sreenath Menon 
Sent: 6/6/2012 8:50:23 AM 
To: user@hive.apache.org 
Subject: Compressed data storage in HDFS - Error 
I would like to compress my data in the HDFS using some Hive commands.
Step followed: (data already residing in table sample)

create table rc_lzo like sample;
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
insert overwrite table rc_lzo select * from sample;

Error:
Compression codec com\.hadoop\.compression\.lzo\.LzoCodec was not found

1)What do I need to do to use Lzo as well as other compression methods?

2)Heard somewhere that :Using compressed data will produce better results than uncompressed
data in some cases. How can this be, as there is always a compression and decompression time
allotted with compression methods. Any truth in this, if so how ? Can understand
 how there are better results when using compression between mappers-to-reducers and in between
map-reduce jobs.

Thanks and Regards
Sreenath Mullassery
Mime
View raw message