crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Watson <benwatson...@gmail.com>
Subject Re: Output Sequence Files into ORC
Date Tue, 15 Sep 2015 13:36:01 GMT
Hi Micah,

Thanks for your help, it's good to see some more examples of ORC in Crunch.
The single ORC record created manually in the test setup is what I needed
to see.

Thanks,

Ben

On Mon, Sep 14, 2015 at 9:50 PM, Micah Whitacre <mkwhitacre@gmail.com>
wrote:

> Ben,
>
> You might look at the OrcSourceTarget integration tests[1].  I'm not an
> expert at OrcFiles but looks like it has a few examples for reading/writing
> data.
>
> [1] -
> https://github.com/apache/crunch/blob/master/crunch-hive/src/it/java/org/apache/crunch/io/orc/OrcFileSourceTargetIT.java#L64
>
> On Mon, Sep 14, 2015 at 8:29 AM, Ben Watson <benwatson528@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I'm trying to write a simple converter in Crunch to turn Sequence files
>> into ORC files. The only examples that I can find for dealing with ORC
>> files are the tutorial at
>> http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/ and
>> then the discussion at https://issues.apache.org/jira/browse/CRUNCH-450.
>> The tutorial seems to only show how to output data that's already in ORC
>> format, which isn't much use for me here.
>>
>> It would be nice to be able to output ORC files like you can with Java
>> MapReduce -
>> http://hadoopathome.logdown.com/posts/277986-using-multipleoutputs-with-orc-in-mapreduce
>> - specifying a Struct, parsing each record into some type of object, and
>> letting the output do the rest. I've tried to replicate this in Crunch by
>> writing a MapFn that basically turns each record into an OrcWritable, but
>> it doesn't work, and even if it did I suspect it wouldn't be very efficient.
>>
>> Is this something that's already possible that I'm missing?
>>
>> Thanks,
>>
>> Ben
>>
>
>

Mime
View raw message