hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Corona (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-333) Add TFileTransport deserializer
Date Sat, 07 Mar 2009 22:52:56 GMT
Add TFileTransport deserializer

                 Key: HIVE-333
                 URL: https://issues.apache.org/jira/browse/HIVE-333
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Serializers/Deserializers
         Environment: Linux
            Reporter: Steve Corona

I've been googling around all night and havn't really found what I am looking for. Basically,
I want to transfer some data from my web servers to hive  in a format that's a little more
verbose than plain CSV files. It seems like JSON or thrift would be perfect for this. I am
planning on sending this serialized json or thrift data through scribe and loading it into
Hive.. I just can't figure out how to tell hive that the input data is a bunch of serialized
thrift records (all of the records are the "struct" type)  in a TFileTransport. Hopefully
this makes sense...

Reply from Joydeep Sen Sarma (jssarma@facebook.com)

Unfortunately the open source code base does not have the loaders we run to convert thrift
records in a tfiletransport into a sequencefile that hadoop/hive can work with. One option
is that we add this to Hive code base (should be straightforward).

No process required. Please file a jira - I will try to upload a patch this weekend (just
cut'n'paste for most part). Would appreciate some help in finessing it out .. (the internal
code is hardwired to some assumptions etc. )

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message