incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: Support custom file formats
Date Thu, 18 Jul 2013 16:45:01 GMT
On Tue, Jul 16, 2013 at 2:45 AM, Subroto Sanyal <sanyalsubroto@gmail.com>wrote:

> Thanks Alan,
>
> Just an another thought.
> How about using a different InputFormat like: STORED as INPUTFORMAT
> com.myproject.MyOwnInputFormat ?
> Which is the best approach and why?
>

Hive and HCat divide the file format into to parts:

serde - translates each row to a sequence of bytes
file format - takes a set of rows (as bytes from the serde) and writes them
to disk

For text formats, the serde controls how the row is serialized and the text
file format puts newlines at the end of each row.

So the question is whether you are trying to control the serialization, the
file container, or both.

Note that newer file formats like ORC and Parquet combine the serde and
format because the serialization is integrated with the file format.

-- Owen


> Downline I would like to read the table from PIG as well.
>
>
> On Mon, Jul 15, 2013 at 7:12 PM, Alan Gates <gates@hortonworks.com> wrote:
>
>> All you need to do is write a Hive SerDe.  There is some documentation at
>> https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide.  Also
>> you can use existing SerDes in Hive as an example.
>>
>> Alan.
>>
>> On Jul 5, 2013, at 8:06 AM, Subroto Sanyal wrote:
>>
>> > Hi,
>> >
>> > Newbie question...
>> > I have my own file format. The files are saved on HDFS. I would like
>> HCatalog to facilitate to read those files by Hive.
>> > Something like:
>> >
>> > Hive
>> > |
>> > HCatalog
>> > |
>> > MyFiles
>> >
>> > Where should I start with?
>> >
>> > Is there any sample integration of other File formats which I can use a
>> reference?
>> >
>> >
>> > --
>> > Cheers,
>> > Subroto Sanyal
>>
>>
>
>
> --
> Cheers,
> *Subroto Sanyal*
>

Mime
View raw message