hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tzur Turkenitz <>
Subject Hive compression, piping data
Date Tue, 27 Aug 2013 10:33:12 GMT
Hi there Hive Groupers,

I've got a question regarding hive Architecture in regards to compression,
or more how Hive treats compressed tables when it reads from them.

Use Case:
1. 2 Compressed tables in HDFS, 1TB  each.
2. One table is compressed with a splittable compression while the other
3. Mapreduce program reads each table and write a new text only table
(uncompressed around 4TB).

What happens when mapreduce access the compressed tables:
*  is data compressed on hdfs or local nodes temp storage
* is compressed data being saved to disk or piped to the map reduce job
* in the shuffle phase, are we saving the uncompressed data on the local

Thank you so much!

View raw message