hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Shao <zsh...@gmail.com>
Subject Re: how to write a SerDe
Date Thu, 09 Jul 2009 06:52:09 GMT
Hi Robert,

Hive does support customized file input/output format other than
SequenceFile/TextFile.
Please take a look at Hive.g which contains the grammar for specifying
InputFileFormat and OutputFileFormat.
You probably only need a InputFileFormat if you only plan to read from
this kind of file.

The InputFileFormat can return a BytesWritable which contains the data
in Binary format.
Then the SerDe.deserialize function should take that BytesWritable and
convert it into some hierarchical objects.

For an example of SerDe, please take a look at
https://issues.apache.org/jira/browse/HIVE-553
That issue contains a fully-fledged SerDe by itself.

I also plan to write a how-to for writing a SerDe, but it won't be
ready in one or two weeks.

Zheng

On Wed, Jul 8, 2009 at 6:46 PM, Roberto Congiu<roberto.congiu@openx.org> wrote:
> Hi,I am writing a SerDe class to be able to query some proprietary format we
> have from hive.
> The format is basically a sequence of records that are maps coded in binary
> for which we have access libraries.
> The file is also gzipped.
>
> For what I understand, I need to
> 1 - write a FileInputFormat class to read the file and extract the single
> records as Writables (but I am not clear how I tell hive to use this
> fileformat since all I can use is STORED AS SEQUENCEFILE/TEXTFILE. How do I
> plug my format in there? )
> 2 - Write a SerDe (Since I just need to read it I need just the deserializer
> part) and an ObjectInspector to let hive understand how to find a column
>
> is there any info around for these or somebody who's done something similar
> ?
> Thanks in advance,
> Roberto
>



-- 
Yours,
Zheng

Mime
View raw message