crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-481) Support independent output committers for multiple outputs
Date Fri, 06 Feb 2015 16:27:36 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309392#comment-14309392
] 

Tom White commented on CRUNCH-481:
----------------------------------

I'm not sure that changing the job ID is enough - for example FileOutputCommitter uses the
task attempt ID to create directories, and I think it may be constructed from a different
Job instance (so it wouldn't get the "decorated" ID). It would be good if the OutputCommitter
implementations could take account of the output name, but I'm not sure how to do that for
FileOutputCommitter - have a Crunch-specific one? (Doing it for Kite is not a problem.) 

CompositeOutputCommitter already has some special casing for FileOutputCommitter, which shows
that there is a problem here...

bq. I'm not very familiar with the commiter logic, but for some reason this wasn't exposed
when running against Hadoop 1.

The committer logic for the new MR API in Hadoop 1 has some limitations. For example, it is
not called properly from the local job runner. For this reason, we don't use an output committer
in Kite under Hadoop 1: https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-mapreduce/src/main/java/org/kitesdk/data/mapreduce/DatasetKeyOutputFormat.java#L478-L481

> Support independent output committers for multiple outputs
> ----------------------------------------------------------
>
>                 Key: CRUNCH-481
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-481
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Aniket Kulkarni
>            Assignee: Josh Wills
>            Priority: Minor
>             Fix For: 0.12.0
>
>         Attachments: CRUNCH-481-hadoop-2-compat.patch, CRUNCH-481.patch, CRUNCH-481.patch,
CRUNCH-481.patch, CRUNCH-481c.patch
>
>
> I faced this issue while trying to write to Kite and HDFS in the same pipeline. A similar
issue was logged for Kite[1][2]. 
> I was attempting to write a PCollection to Kite and a different PTable to HDFS as a text
file. The write to Kite succeeded, however the write to HDFS only produced a _SUCCESS file
with no text file.
> Commenting out the write to Kite seems to solve the issue and I can see the text file
being written.
> [1] - https://issues.cloudera.org/browse/CDK-756
> [2] - http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3CCAF-WD4QCUe0Toh3qewpDNnom3u786PVJLgH7T6Go_AbcTpLTaw@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message