arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emilio Lahr-Vivaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ARROW-542) [Java] Implement dictionaries in stream/file encoding
Date Thu, 09 Feb 2017 22:03:41 GMT

    [ https://issues.apache.org/jira/browse/ARROW-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860292#comment-15860292
] 

Emilio Lahr-Vivaz commented on ARROW-542:
-----------------------------------------

Another blocker I'm hitting is that I don't see any way that the type of a dictionary block
can be determined during read. DictionaryEncoding has an indexType, but that seems to refer
to the ints used to reference the dictionary values: https://github.com/apache/arrow/blob/b99d049c3d1894908b7e52774eb657675dc1f439/format/Message.fbs#L165
A dictionary encoded vector currently has it's type defined as the dictionary index type,
but the type of the dictionary is not defined. It works when the data is in memory with the
dictionary alongside it, but not when encoding to the file format... Possibly the dictionary
encoded vector should specify the dictionary type? It seems like either that or the message
format needs another field for the dictionary type.

> [Java] Implement dictionaries in stream/file encoding
> -----------------------------------------------------
>
>                 Key: ARROW-542
>                 URL: https://issues.apache.org/jira/browse/ARROW-542
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java - Vectors
>            Reporter: Emilio Lahr-Vivaz
>            Assignee: Emilio Lahr-Vivaz
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message