arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wes McKinney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ARROW-462) [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent
Date Fri, 06 Jan 2017 16:30:58 GMT

    [ https://issues.apache.org/jira/browse/ARROW-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804922#comment-15804922
] 

Wes McKinney commented on ARROW-462:
------------------------------------

One issue is the handling of the hash keys (e.g. strings). After performing the hash table
pass, you would like to minimize time to create the final dictionary and indices arrays. We
can run various performance experiments and choose whatever yields best performance for simplicity.


> [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray
equivalent
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-462
>                 URL: https://issues.apache.org/jira/browse/ARROW-462
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>
> We use a hash table to extract unique values and dictionary indices. There may be an
opportunity to consolidate common code from the dictionary encoding implementation implemented
in parquet-cpp (but the dictionary indices will not be run-length encoded in Arrow):
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/encodings/dictionary-encoding.h



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message