hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho (JIRA)" <>
Subject [jira] [Updated] (HIVE-11912) Make snappy compression default for parquet tables
Date Tue, 22 Sep 2015 01:01:04 GMT


Szehon Ho updated HIVE-11912:
    Attachment: HIVE-11912.patch

Attaching a patch.  Unfortunately, there's no Serde extension today to specify default table
properties, like StorageHandler.

The most logical place seemed to be StorageFormat abstraction, which is the the rough equivalent
of StorageHandler.  By putting it there instead of in AbstractSerde, we don't have to waste
time initializing the Serde.

Also, there's some issues on MacOS.  To workaround the Mac / Snappy issue for Snappy version
< 1.0.5 as is the case for Hadoop 2.6, HADOOP_OPTS should be set like
export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp

> Make snappy compression default for parquet tables
> --------------------------------------------------
>                 Key: HIVE-11912
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats
>            Reporter: Szehon Ho
>         Attachments: HIVE-11912.patch
> Snappy is a popular compression codec for Parquet, and is the default in many Parquet
applications, increasing the performance.  
> This change would make it the default for new Hive Parquet tables.

This message was sent by Atlassian JIRA

View raw message