hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <>
Subject [jira] [Created] (HIVE-13275) Add a toString method to BytesRefArrayWritable
Date Sun, 13 Mar 2016 14:57:33 GMT
Harsh J created HIVE-13275:

             Summary: Add a toString method to BytesRefArrayWritable
                 Key: HIVE-13275
             Project: Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 1.1.0
            Reporter: Harsh J
            Assignee: Harsh J
            Priority: Trivial
         Attachments: HIVE-13275.000.patch

RCFileInputFormat cannot be used externally for Hadoop Streaming today cause Streaming generally
relies on the K/V pairs to be able to emit text representations (via toString()).

Since BytesRefArrayWritable has no toString() methods, the usage of the RCFileInputFormat
causes object representation prints which are not useful.

Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an array), so its
important to output them in a valid/parseable manner, as opposed to choosing a simple joining
delimiter over the string representations of the inner elements.

I propose adding a standardised CSV formatting of the array data, such that users of Streaming
can then parse the results in their own script. Since we have OpenCSV as a dependency already,
we can make use of it for this purpose.

This message was sent by Atlassian JIRA

View raw message