hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tzur Turkenitz <tzur.turken...@gmail.com>
Subject Hive compression, piping data
Date Tue, 27 Aug 2013 10:33:12 GMT
Hi there Hive Groupers,

I've got a question regarding hive Architecture in regards to compression,
or more how Hive treats compressed tables when it reads from them.

Use Case:
1. 2 Compressed tables in HDFS, 1TB  each.
2. One table is compressed with a splittable compression while the other
isn't.
3. Mapreduce program reads each table and write a new text only table
(uncompressed around 4TB).

What happens when mapreduce access the compressed tables:
*  is data compressed on hdfs or local nodes temp storage
* is compressed data being saved to disk or piped to the map reduce job
* in the shuffle phase, are we saving the uncompressed data on the local
nodes?

Thank you so much!

Mime
View raw message