crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <...@cloudera.com>
Subject Re: Support OutputCommitter?
Date Thu, 27 Feb 2014 17:03:11 GMT
Is it possible to have multiple targets that Crunch runs in one
MapReduce job? If so then there will be a conflict, and Crunch will
need some changes to support this case.

Tom

On Thu, Feb 27, 2014 at 3:34 PM, Chao Shi <stepinto@live.com> wrote:
> Hi Tom,
>
> I will have to use named-output. About your example DatasetTarget, is it
> safe to setOutputFormat() explicitly here? I guess this may conflict with
> other targets that only use the same trick. Is it possible for us to have a
> general approach to get OutputCommitter work?
> Hi Chao,
>
> Crunch doesn't call the output committer explicitly itself, it's
> called by the MR framework as a normal part of running a job. However,
> in Crunch's MapReduceTarget#configureForMapReduce the output format is
> not typically set for the named-output case (which is the only case
> that is executed now, as I discovered in the thread mentioned below),
> so it defaults to FileOutputFormat, with its semantics. (This is why
> HBaseTarget calls FileOutputFormat.setOutputPath, which it wouldn't
> have to if it set the output format explicitly to HBase's
> TableOutputFormat.)
>
> Are you setting the HCatOutputFormat in the named-output case? In the
> Crunch Target I'm writing I've set the OutputFormat explicitly:
> https://github.com/tomwhite/kite/blob/CDK-308-dataset-output-format/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L106
>
> Cheers,
> Tom
>
> On Thu, Feb 27, 2014 at 7:54 AM, Gabriel Reid <gabriel.reid@gmail.com>
> wrote:
>> For reference, here's the link to the previous thread on this:
>>
> http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3cCAF-WD4Sig2n7yMxiZSji8trQy-8wfUy5_7dnKC=dkSxmrfSPVA@mail.gmail.com%3e
>>
>> On Thu, Feb 27, 2014 at 7:56 AM, Josh Wills <jwills@cloudera.com> wrote:
>>> +tom
>>>
>>> Didn't Tom have a thing like this a little while ago?
>>>
>>>
>>> On Wed, Feb 26, 2014 at 8:04 PM, Chao Shi <stepinto@live.com> wrote:
>>>
>>>> Hi crunch devs,
>>>>
>>>> I'm developing target wrapper for HCatOutputFormat, which uses a custom
>>>> OutputCommiter to get results committed to hive. It seems its
>>>> OutputCommitter is not called at all. Looking into the code, I can't
> find
>>>> where crunch calls it. Is it really supported?
>>>>
>>>> Thanks,
>>>> Chao
>>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera <http://www.cloudera.com>
>>> Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message