crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <...@cloudera.com>
Subject Output Committers and Crunch Targets
Date Wed, 29 Jan 2014 16:11:02 GMT
Hi,

I'm writing a Crunch Target that is a MapReduceTarget, but not a
PathTarget, since it writes to files in a partitioned manner, so there
is not necessarily a single output path. I'm confused about the 'name'
parameter in configureForMapReduce() though - I would expect that
named outputs would not be used in my simple pipeline, so name would
be null, but it actually seems that the name parameter is 'out0'. So
my first question is: what determines when named outputs are used?

In the past this hasn't been a problem (e.g. with the Parquet target),
but this output format has a custom output committer which isn't being
used. Instead it looks like the default file committer is being used
by Crunch, so the job fails. Is it possible to use custom output
committers with Crunch?

My code is here:
https://github.com/tomwhite/kite/blob/CDK-251-mr/kite-data/kite-data-crunch/src/main/java/org/kitesdk/data/crunch/DatasetTarget.java#L100

Cheers,
Tom

Mime
View raw message