crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: Output Sequence Files into ORC
Date Mon, 14 Sep 2015 20:50:52 GMT
Ben,

You might look at the OrcSourceTarget integration tests[1].  I'm not an
expert at OrcFiles but looks like it has a few examples for reading/writing
data.

[1] -
https://github.com/apache/crunch/blob/master/crunch-hive/src/it/java/org/apache/crunch/io/orc/OrcFileSourceTargetIT.java#L64

On Mon, Sep 14, 2015 at 8:29 AM, Ben Watson <benwatson528@gmail.com> wrote:

> Hi all,
>
> I'm trying to write a simple converter in Crunch to turn Sequence files
> into ORC files. The only examples that I can find for dealing with ORC
> files are the tutorial at
> http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/ and
> then the discussion at https://issues.apache.org/jira/browse/CRUNCH-450.
> The tutorial seems to only show how to output data that's already in ORC
> format, which isn't much use for me here.
>
> It would be nice to be able to output ORC files like you can with Java
> MapReduce -
> http://hadoopathome.logdown.com/posts/277986-using-multipleoutputs-with-orc-in-mapreduce
> - specifying a Struct, parsing each record into some type of object, and
> letting the output do the rest. I've tried to replicate this in Crunch by
> writing a MapFn that basically turns each record into an OrcWritable, but
> it doesn't work, and even if it did I suspect it wouldn't be very efficient.
>
> Is this something that's already possible that I'm missing?
>
> Thanks,
>
> Ben
>

Mime
View raw message