crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-509) Crunch with Spark doesn't name all outputs
Date Fri, 08 May 2015 20:46:00 GMT


Gabriel Reid commented on CRUNCH-509:

At first, looking at the AvroOutputFormats I was a bit confused how that could still work,
but a bit more looking at it and it makes sense. I did find those plain-text Avro schemas
that were available in the job configuration handy for debugging sometimes, but obviously
they shouldn't be in there anymore if they don't need to be.

As for the Spark stuff, yeah, it's a bit hacky-looking, but I don't see any problem with it
(or better option) as long as the outputs are being written out in a loop like that.

> Crunch with Spark doesn't name all outputs
> ------------------------------------------
>                 Key: CRUNCH-509
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.11.0
>            Reporter: Micah Whitacre
>            Assignee: Josh Wills
>             Fix For: 0.12.0
>         Attachments: CRUNCH-509.patch, CRUNCH-509b.patch
> Crunch currently does not "name" all outputs when running with a SparkPipeline.  This
becomes a problem as some Targets (based on CRUNCH-82) have coded in checked to ensure that
the name must be populated.  Specifically the implementation I'm running into issues with
is the Kite DatasetTarget[2].
> Need to read up a bit on context to see if it is a Crunch/Kite issue or where it is easiest/correct
to fix.  [~jwills] or [~tomwhite] feedback would be welcome.
> [1] -
> [2] -

This message was sent by Atlassian JIRA

View raw message