hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bobby Rullo <>
Subject Re: Hive ignores key when reading sequence files?
Date Fri, 09 Oct 2009 00:52:25 GMT

I'll take a look at that.

It seems the easiest thing would be to subclass  
SequenceFileInputFormat, and override getRecordReader(), to return a  
RecordReader which wraps SequenceFileRecordReader and overrides

Is it safe to assume that K,V are both Text writables, so I can just  
append the bytes of one to the other?

On Oct 6, 2009, at 8:10 PM, Zheng Shao wrote:

> Hi Bobby,
> We just need a special FileInputFormat - The FileInputFormat should  
> be able to read SequenceFile, and then prepend the key to the value  
> before it's returned to the Hive framework.
> Then in Hive language, we can say:
> add jar my.jar;
> CREATE TABLE mytable (key STRING, value STRING)
> '';
> See
> You may also want to write your own OutputFileFormat which split the  
> row passed in into key and value and store them separately. But that  
> is not needed unless you want to use Hive to INSERT to this table  
> (LOAD does NOT need this).
> Zheng
> On Tue, Oct 6, 2009 at 6:19 PM, Bobby Rullo <> wrote:
> Hi there,
> It seems that Hive ignores the key when reading hadoop sequence  
> files. Is there a way to make it not do that?
> If there's no way to do this with a 'stock' Hive build, could  
> someone point me to the code that reads sequence files in Hive and I  
> can have a go at it? It's sort of a show-stopper for us - we have a  
> bunch of large files where the key field is important.
> Bobby
> -- 
> Yours,
> Zheng

View raw message