crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chao Shi <stepi...@live.com>
Subject Re: Support OutputCommitter?
Date Thu, 27 Feb 2014 15:34:18 GMT
Hi Tom,

I will have to use named-output. About your example DatasetTarget, is it
safe to setOutputFormat() explicitly here? I guess this may conflict with
other targets that only use the same trick. Is it possible for us to have a
general approach to get OutputCommitter work?
Hi Chao,

Crunch doesn't call the output committer explicitly itself, it's
called by the MR framework as a normal part of running a job. However,
in Crunch's MapReduceTarget#configureForMapReduce the output format is
not typically set for the named-output case (which is the only case
that is executed now, as I discovered in the thread mentioned below),
so it defaults to FileOutputFormat, with its semantics. (This is why
HBaseTarget calls FileOutputFormat.setOutputPath, which it wouldn't
have to if it set the output format explicitly to HBase's
TableOutputFormat.)

Are you setting the HCatOutputFormat in the named-output case? In the
Crunch Target I'm writing I've set the OutputFormat explicitly:
https://github.com/tomwhite/kite/blob/CDK-308-dataset-output-format/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L106

Cheers,
Tom

On Thu, Feb 27, 2014 at 7:54 AM, Gabriel Reid <gabriel.reid@gmail.com>
wrote:
> For reference, here's the link to the previous thread on this:
>
http://mail-archives.apache.org/mod_mbox/crunch-dev/201401.mbox/%3cCAF-WD4Sig2n7yMxiZSji8trQy-8wfUy5_7dnKC=dkSxmrfSPVA@mail.gmail.com%3e
>
> On Thu, Feb 27, 2014 at 7:56 AM, Josh Wills <jwills@cloudera.com> wrote:
>> +tom
>>
>> Didn't Tom have a thing like this a little while ago?
>>
>>
>> On Wed, Feb 26, 2014 at 8:04 PM, Chao Shi <stepinto@live.com> wrote:
>>
>>> Hi crunch devs,
>>>
>>> I'm developing target wrapper for HCatOutputFormat, which uses a custom
>>> OutputCommiter to get results committed to hive. It seems its
>>> OutputCommitter is not called at all. Looking into the code, I can't
find
>>> where crunch calls it. Is it really supported?
>>>
>>> Thanks,
>>> Chao
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message