hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Connell, Chuck" <Chuck.Conn...@nuance.com>
Subject BINARY column type
Date Sat, 01 Dec 2012 16:50:47 GMT
I am trying to use BINARY columns and believe I have the perfect use-case for it, but I am
missing something. Has anyone used this for true binary data (which may contain newlines)?


Here is the background... I have some files that each contain just one logical field, which
is a binary object. (The files are Google Protobuf format.) I want to put these binary files
into a larger file, where each protobuf is a logical record. Then I want to define a Hive
table that stores each protobuf as one row, with the entire protobuf object in one BINARY
column. Then I will use a custom UDF to select/query the binary object.


This is about as simple as can be for putting binary data into Hive.


What file format should I use to package the binary rows? What should the Hive table definition
be? Which SerDe option (LazySimpleBinary?). I cannot use TEXTFILE, since the binary may contain
newlines. Many of my attempts have choked on the newlines.


Thank you,

Chuck Connell

Nuance

Burlington, MA


Mime
View raw message