Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Thu, 28 Mar 2013 21:11:15 +0000 (UTC)
From: "Owen O'Malley (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12639607.1364485129644.71978.1364505075666@arcas>
In-Reply-To: <JIRA.12639607.1364485129644@arcas>
References: <JIRA.12639607.1364485129644@arcas>
Subject: [jira] [Commented] (HIVE-4244) Make string dictionaries adaptive in
 ORC
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616657#comment-13616657 ] 

Owen O'Malley commented on HIVE-4244:
-------------------------------------

We should play with different values, but I was guessing the right cutover point for the heuristic was at a loading of 2 to 3 (50% to 33% distinct values).

We aren't really going to know whether the heuristic is right or wrong unless we compare both encodings, which is much too expensive. By taking a good guess after looking at the start of the stripe, we can get good performance most of the time.
                
> Make string dictionaries adaptive in ORC
> ----------------------------------------
>
>                 Key: HIVE-4244
>                 URL: https://issues.apache.org/jira/browse/HIVE-4244
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Owen O'Malley
>            Assignee: Kevin Wilfong
>
> The ORC writer should adaptively switch between dictionary and direct encoding. I'd propose looking at the first 100,000 values in each column and decide whether there is sufficient loading in the dictionary to use dictionary encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira