hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <>
Subject Re: Writing Sequence Files
Date Mon, 04 May 2015 22:03:27 GMT
On Mon, May 4, 2015 at 11:02 AM, Grant Overby (groverby) <
> wrote:

>   I’m looking for some sample code to write a hive compatible sequence
> file for an external table and matching ddl.

In general the easiest way is to create a table with what you'd like to
have and use Hive to write to table like that.

> I’m starting with a java pojo. I can create an Object Inspector for this
> class. I’m reasonably sure I can write a serde leveraging java’s
> externalizable serialization. I’m coming up a bit short on how to wire this
> together.

Ok, to make Hive happy you need to pick a serde. The default is
LazySimpleSerDe, so let's assume you'll use that one:

hive> create table people(name string, id int) stored as sequencefile;

will look like:
SequenceFile - key: BytesWritable, value: Text
The key is ignored and the value will be same string that would have been
used in a text file:


where ^A is control-A.

> My end goal is to have this file query able while I’m writing to it. I
> don’t know if Hive will work this way out of the box. Perhaps I’ll need a
> modified InputFormat to skip over incomplete rows?

The SequenceFile reader isn't very tolerant of incomplete files. You would
probably want an InputFormat that finds an instance of the sequence file
marker and only reads up to that. Of course if your file is complete that
will skip the last set of rows so you'd need to know the difference between
incomplete and complete files.

You might look at the work we did with the streaming ingest and ORC files.

.. Owen

View raw message