hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Wong <>
Subject RE: hive storing a byte array
Date Tue, 24 May 2011 19:23:53 GMT
I claim no experience in storing blobs in Hive, but it sounds to me that using array/list will
be quite inefficient, in terms of both size and run time.

-----Original Message-----
From: Luke Forehand [] 
Sent: Tuesday, May 24, 2011 7:31 AM
Subject: Re: hive storing a byte array


Thanks for your reply!  I have written it the way you mentioned, based on
an earlier post in this mailing list.  I'm concerned about having to
encode/decode the string in base64, I'm wondering how much this will
impact my job run time.

I have also written a UDF that emits a byte array, stored in a field of
type array<tinyint>.  When reading this field, the ObjectInspector is a
ListObjectInspector with primitiveJavaByte for the list elements.  Reading
this field in the UDF seems clunky because I have to iterate over the
list, reading each byte into a byte array, before I can use it.

Given both approaches, which one do you think has the least performance


On 5/23/11 6:59 PM, "Steven Wong" <> wrote:

>Hive does not support the blob data type. An option is to store your
>binary data encoded as string (such as using base64) and define them in
>Hive as string.
>-----Original Message-----
>From: Luke Forehand []
>Sent: Monday, May 23, 2011 1:21 PM
>Subject: hive storing a byte array
>Can someone please provide an example in Hive, how I can store a
>serialized object in a field?  A field type of byte array or binary or
>blob is really what I was looking for, but if something slightly less
>trivial is involved some instruction would be much appreciated.  This
>object is used in a custom UDF later on in the processing pipeline.

View raw message