crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-481) Support independent output committers for multiple outputs
Date Thu, 04 Dec 2014 22:39:12 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234733#comment-14234733
] 

Josh Wills commented on CRUNCH-481:
-----------------------------------

So had an oh-duh moment when I was working on the Spark impl: I realize the Spark impl writes
each output out individually using the "native" OutputFormat for the Target (i.e., no multiple
outputs), so this change is only needed for MR jobs, not Spark jobs. Will commit in a few.

> Support independent output committers for multiple outputs
> ----------------------------------------------------------
>
>                 Key: CRUNCH-481
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-481
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Aniket Kulkarni
>            Assignee: Josh Wills
>            Priority: Minor
>         Attachments: CRUNCH-481.patch
>
>
> I faced this issue while trying to write to Kite and HDFS in the same pipeline. A similar
issue was logged for Kite[1][2]. 
> I was attempting to write a PCollection to Kite and a different PTable to HDFS as a text
file. The write to Kite succeeded, however the write to HDFS only produced a _SUCCESS file
with no text file.
> Commenting out the write to Kite seems to solve the issue and I can see the text file
being written.
> [1] - https://issues.cloudera.org/browse/CDK-756
> [2] - http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3CCAF-WD4QCUe0Toh3qewpDNnom3u786PVJLgH7T6Go_AbcTpLTaw@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message