hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matias Silva <>
Subject Re: Google Protocol Buffers and Hive
Date Fri, 02 Sep 2011 18:57:06 GMT
Hi Valentina, thanks for your response.  Do you think using external tables, I can still partition
the data?  I do like
the external table idea because that will save from having to do an additional import of the
data into hive from just loading
into HDFS.   Plus it will save on space.  

How is the performance using GPB/Hive?

Another thing I think we can do is use the pig/elephant bird to read the GPB files and then
write them out to a tab delimited, plain text format
and import the data into hive.  This would be a copy of the data, but would it be cleaner.


On Sep 2, 2011, at 9:43 AM, valentina kroshilina wrote:

> I use MR to generate tables using Elephant-Bird's OutputFormat. Hive
> can read from EXTERNAL tables using ProtobufHiveSerde and
> ProtobufBlockInputFormat generated by Elephant-Bird. Create table
> statement looks like the following:
> (
> ...
> )
> ROW FORMAT SERDE 'elephantbird.proto.hive.serde.LzoXXXProtobufHiveSerde'
> inputformat 'elephantbird.proto.mapred.input.DeprecatedLzoXXXProtobufBlockInputFormat'
> outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat'
> So the solution is to use external tables.
> Let me know if it helps.
> On Thu, Sep 1, 2011 at 8:45 PM, Matias Silva <> wrote:
>> Hi Everyone, is there any documentation regarding importing
>> GoogleProtocolBuffer files into Hive.  I'm scouring over the internet
>> and the closest thing I came
>> across
>> I saw something from Elephant-Bird where I can load the GPB file using pig
>> and then store it in a plain text format and then load
>> into Hive.  It would be great if I can just load from GPB directly into
>> Hive.
>> Any pointers?
>> Thanks for your time and knowledge,
>> Matt

Matias Silva   [Sr. Data Warehouse Developer]
p 949.861.8888 x1420      f 949.861.8990

View raw message