hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13275) Add a toString method to BytesRefArrayWritable
Date Sun, 13 Mar 2016 15:24:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192388#comment-15192388
] 

Harsh J commented on HIVE-13275:
--------------------------------

A couple of caveats though:
# While RCFile may natively support columns with newline characters in their data, the toString
representation for use in Hadoop Streaming will likely not work well with that (cause of text
format)
# If the bytes are encoded in any form other than simple text representations in future, such
as in Avro, Protobuf, etc., the toString representation will not be directly useful anymore

> Add a toString method to BytesRefArrayWritable
> ----------------------------------------------
>
>                 Key: HIVE-13275
>                 URL: https://issues.apache.org/jira/browse/HIVE-13275
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats, Serializers/Deserializers
>    Affects Versions: 1.1.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Trivial
>         Attachments: HIVE-13275.000.patch
>
>
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause Streaming
generally relies on the K/V pairs to be able to emit text representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the RCFileInputFormat
causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an array),
so its important to output them in a valid/parseable manner, as opposed to choosing a simple
joining delimiter over the string representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that users of
Streaming can then parse the results in their own script. Since we have OpenCSV as a dependency
already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message