orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charles Pritchard (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ORC-41) Using referenced columns for improved compression
Date Wed, 10 Feb 2016 01:46:18 GMT
Charles Pritchard created ORC-41:

             Summary: Using referenced columns for improved compression
                 Key: ORC-41
                 URL: https://issues.apache.org/jira/browse/ORC-41
             Project: Orc
          Issue Type: Improvement
            Reporter: Charles Pritchard

Many data sets I work with have one column which essentially references another, with one
column being a bigint and one column being a string. It is always a case that the value of
the integer field determines the value of the string field.

I also work with data sets where one bigint field is always going to determine the value of
another bigint field, likely in a tree.

There is an opportunity to achieve better compression by identifying these use cases and adding
in additional logic for such cross-column/dictionary lookups.

This message was sent by Atlassian JIRA

View raw message