hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Connell (JIRA)" <>
Subject [jira] [Commented] (HIVE-2380) Add Binary Datatype in Hive
Date Fri, 30 Nov 2012 23:44:00 GMT


Chuck Connell commented on HIVE-2380:

I am trying to use this feature (BINARY columns) and I believe I have the perfect use-case
for it, but I am missing something. 

Here is the background... I have some files that each contain just one logical field, which
is a binary object. (The files are Google Protobuf format.) I want to put these binary files
into a larger file, where each protobuf is a logical record. Then I want to define a Hive
table that stores each protobuf as one row, with the entire protobuf object in one BINARY
column. Then I will use a custom UDF to select/query the binary object. 

This is about as simple as can be for putting binary data into Hive. But all of the test cases
for this jira seem to draw the binary columns from another existing table and CAST them. I
want to load the files from disk.

What file format should I use to package the binary rows? What should the Hive table definition
be? I cannot use TEXTFILE, since the binary may contain newlines. Many of my attempts have
choked on the newlines.

Thanks very much,
Chuck Connell
Burlington, MA

> Add Binary Datatype in Hive
> ---------------------------
>                 Key: HIVE-2380
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Serializers/Deserializers
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.8.0
>         Attachments: hive-2380_1.patch, hive-2380_2.patch, hive-2380_3.patch, hive-2380_4.patch,
> Add bytearray as a primitive data type.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message