hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Girish <agir...@ncsu.edu>
Subject Re: Writing to an ORCFile using MapReduce + HCatalog APIs
Date Mon, 07 Apr 2014 18:42:05 GMT
Thanks Eugene.

I had gone over the link previously. I am not looking for a way to do this
via Pig CLI. My data is going to be read by a *MapReduce job* (by Pig i
meant custom Pig source code) and hence need input readers and output
writers. I need to be able to write using HCatOutputFormat.

I currently have a custom TextOutputFormat, which writes to a text file. I
instead need to *write data to an ORCFile*.

Hence I would need to define a schema and then write to the file. However,
I could not see a way to specify the storage type as ORCFile. Do we
manually have to create a Hive table stored as ORC and then use HCatalog to
load data into it? Is there no way to directly create an ORCFile
programmatically and write data into it?

Thanks again!

Regards,
Abhishek


On Mon, Apr 7, 2014 at 1:47 PM, Eugene Koifman <ekoifman@hortonworks.com>wrote:

> If you are writing from Pig using HCatStorer you don't need to create
> HCatSchema.
>
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-Usage.1has
examples on how to do it.
>
> So if you create a Hive table that use ORC you should be able to write
> your Pig cursor to that table with 1 line command in your Pig script.
>
> Eugene
>
>
> On Sun, Apr 6, 2014 at 4:17 PM, Abhishek Girish <agirish@ncsu.edu> wrote:
>
>> Hi,
>>
>> I am working on a custom Pig source code that writes RDF data into text
>> files. I was looking to instead *write to an ORCFile* for some of the
>> columnar benefits it offers.
>>
>> I understand that I need to use *HCatalog APIs*. I have an idea on how
>> to create HCatSchema for my data. And that I would need to use the
>> HCatOutputFormat for writing into ORCFile.
>>
>> I need some help on *how to specify the storage format as ORCFile.* I
>> see that ORC has built-in support. But I cannot find any examples as to how
>> to specify which output format the HCatalog APIs can write to (default Hive
>> table or RCFile or ORCFile or Sequence File etc..).
>>
>> I would then need to work on reading from these ORCFiles and reconstruct
>> the records.
>>
>> Any pointers would be appreciated. Thanks in advance.
>>
>> Regards,
>> Abhishek
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Mime
View raw message