hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bobby Rullo <bo...@metaweb.com>
Subject Re: Hive ignores key when reading sequence files?
Date Fri, 09 Oct 2009 00:52:25 GMT
Zheng,

I'll take a look at that.

It seems the easiest thing would be to subclass  
SequenceFileInputFormat, and override getRecordReader(), to return a  
RecordReader which wraps SequenceFileRecordReader and overrides  
RecordReader.next....right?

Is it safe to assume that K,V are both Text writables, so I can just  
append the bytes of one to the other?

Bobby
On Oct 6, 2009, at 8:10 PM, Zheng Shao wrote:

> Hi Bobby,
>
> We just need a special FileInputFormat - The FileInputFormat should  
> be able to read SequenceFile, and then prepend the key to the value  
> before it's returned to the Hive framework.
>
> Then in Hive language, we can say:
>
> add jar my.jar;
> CREATE TABLE mytable (key STRING, value STRING)
> STORED AS INPUTFORMAT 'com.my.inputformat' OUTPUTFORMAT  
> 'org.apache.hadoop.io.SequenceFileOutputFormat';
>
> See http://issues.apache.org/jira/browse/HIVE-177
>
> You may also want to write your own OutputFileFormat which split the  
> row passed in into key and value and store them separately. But that  
> is not needed unless you want to use Hive to INSERT to this table  
> (LOAD does NOT need this).
>
> Zheng
>
> On Tue, Oct 6, 2009 at 6:19 PM, Bobby Rullo <bobby@metaweb.com> wrote:
> Hi there,
>
> It seems that Hive ignores the key when reading hadoop sequence  
> files. Is there a way to make it not do that?
>
> If there's no way to do this with a 'stock' Hive build, could  
> someone point me to the code that reads sequence files in Hive and I  
> can have a go at it? It's sort of a show-stopper for us - we have a  
> bunch of large files where the key field is important.
>
> Bobby
>
>
>
> -- 
> Yours,
> Zheng


Mime
View raw message