crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Finding Input Split from DoFn
Date Thu, 22 Nov 2012 10:35:46 GMT
getContext() from inside of a DoFn during or after initialize() will return
the TaskInputOutputContext, which will be a MapContext when you call it
from a Mapper, and MapContext has a getInputSplit() method. We don't
normally want a DoFn to worry about whether it's on the map-side or the
reduce-side of a MapReduce job, so we don't indicate the distinction by
default, which means you need to do something like:

if (getContext() instanceof MapContext) {
  InputSplit split = ((MapContext) getContext()).getInputSplit()
}

which is a little ugly-- sorry about that.

J


On Thu, Nov 22, 2012 at 1:45 AM, Ashish <paliwalashish@gmail.com> wrote:

> Hi All,
>
> Is there a way to find the InputSplit from within an implementation of
> DoFn?
>
> I am trying to implement Inverted Index example using crunch. Have tried
> peeking in DoFn code, but couldn't find a way to retrieve InputSplit. Can
> someone point me in right direction.
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message