hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bobby Rullo <>
Subject Re: key part of sequence files
Date Thu, 05 Nov 2009 16:56:40 GMT
I had the exact same question, and Zheng told me I had to implement a  
new FileInputFormat, so I extended SequenceFileInputFormat, and it  
worked out pretty well.

If you like, I can post the source code somewhere (here?), but it was  
pretty easy.

On Nov 5, 2009, at 8:20 AM, Andrey Pankov wrote:

> Hi guys,
> We have a lot of data stored inside compressed SEQ files. Since SEQ is
> a sequence of (key,value) pairs we are storing set of columns joined
> by tab in key part of SEQ, and the same for value part for another set
> of columns. So our SEQ files are of type (Text,Text).
> Hive cannot understand such files correctly, i.e. I'm not satisfied by
> its defaults. What it does - it ignores key part of SEQ, and value
> part can deserialize into set of columns successfully.
> Can some please point me how to get Hive not ignore SEQ's key?
> Thanks.
> -- 
> Andrey Pankov

View raw message