hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Parquet tables with snappy compression
Date Wed, 25 Jan 2017 22:13:20 GMT

> Has there been any study of how much compressing Hive Parquet tables with snappy reduces
storage space or simply the table size in quantitative terms?

http://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet/20

Since SNAPPY is just LZ77, I would assume it would be useful in cases of Parquet leaves containing
text with large common sub-chunks (like URLs or log data).

If you want to experiment with that corner case, the L_COMMENT field from TPC-H lineitem is
a good compression-thrasher.

Cheers,
Gopal



Mime
View raw message