crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-552) Enable AvroParquet to work with Crunch-on-Spark
Date Tue, 28 Jul 2015 00:39:04 GMT


Josh Wills updated CRUNCH-552:
    Attachment: CRUNCH-552.patch

The patch for this, which does a couple of things:

1) Makes Crunch's custom OutputFormat for Parquet public so Spark can access it,
2) Moves some of the avro test classes (Employee and Person) to the crunch-test module so
that they can be used by both crunch-core and crunch-spark,
3) Adds Avro/Parquet tests for Spark, and
4) Notes that crunch.namedoutput should be set to "out0" in Crunch-on-Spark so that the Avro
Parquet implementation will work properly.

> Enable AvroParquet to work with Crunch-on-Spark
> -----------------------------------------------
>                 Key: CRUNCH-552
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, IO
>    Affects Versions: 0.12.0
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.13.0
>         Attachments: CRUNCH-552.patch
> Via the mailing list, we got a bug report that Crunch's Parquet target classes did not
work with Crunch-on-Spark. The most obvious problem was Spark not being able to access the
OutputFormat class that Crunch was using for reading Parquet files as Avro records, but there
were a couple of other smaller issues that needed to be fixed as well.

This message was sent by Atlassian JIRA

View raw message