crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Watson <benwatson...@gmail.com>
Subject Output Sequence Files into ORC
Date Mon, 14 Sep 2015 13:29:25 GMT
Hi all,

I'm trying to write a simple converter in Crunch to turn Sequence files
into ORC files. The only examples that I can find for dealing with ORC
files are the tutorial at
http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/ and then
the discussion at https://issues.apache.org/jira/browse/CRUNCH-450. The
tutorial seems to only show how to output data that's already in ORC
format, which isn't much use for me here.

It would be nice to be able to output ORC files like you can with Java
MapReduce -
http://hadoopathome.logdown.com/posts/277986-using-multipleoutputs-with-orc-in-mapreduce
- specifying a Struct, parsing each record into some type of object, and
letting the output do the rest. I've tried to replicate this in Crunch by
writing a MapFn that basically turns each record into an OrcWritable, but
it doesn't work, and even if it did I suspect it wouldn't be very efficient.

Is this something that's already possible that I'm missing?

Thanks,

Ben

Mime
View raw message