incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish <paliwalash...@gmail.com>
Subject Re: Finding Input Split from DoFn
Date Thu, 22 Nov 2012 14:41:53 GMT
Post completed. Here is the link
http://goo.gl/tBptp<http://www.linkedin.com/nus-trk?trkact=viewShareLink&pk=network_update_snippet&pp=0&poster=13896620&uid=5677389813070716928&ut=NUS_UNIU_SHARE&r=&f=0&url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fshare%3FviewLink%3D%26sid%3Ds1331190160%26url%3Dhttp%253A%252F%252Fgoo%252Egl%252FtBptp%26urlhash%3DtLSO%26uid%3D5677389813070716928%26trk%3DNUS_UNIU_SHARE-lnk&urlhash=F1bf>

Comments/suggestion and more ideas are welcome.


On Thu, Nov 22, 2012 at 7:28 PM, Ashish <paliwalashish@gmail.com> wrote:

> Thanks Josh !
>
> It worked, my inverted index example using Crunch is complete. Slowly
> getting addicted to crunch coding style.
>
>
> On Thu, Nov 22, 2012 at 4:05 PM, Josh Wills <jwills@cloudera.com> wrote:
>
>> getContext() from inside of a DoFn during or after initialize() will
>> return the TaskInputOutputContext, which will be a MapContext when you call
>> it from a Mapper, and MapContext has a getInputSplit() method. We don't
>> normally want a DoFn to worry about whether it's on the map-side or the
>> reduce-side of a MapReduce job, so we don't indicate the distinction by
>> default, which means you need to do something like:
>>
>> if (getContext() instanceof MapContext) {
>>   InputSplit split = ((MapContext) getContext()).getInputSplit()
>> }
>>
>> which is a little ugly-- sorry about that.
>>
>> J
>>
>>
>> On Thu, Nov 22, 2012 at 1:45 AM, Ashish <paliwalashish@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Is there a way to find the InputSplit from within an implementation of
>>> DoFn?
>>>
>>>  I am trying to implement Inverted Index example using crunch. Have
>>> tried peeking in DoFn code, but couldn't find a way to retrieve InputSplit.
>>> Can someone point me in right direction.
>>>
>>> --
>>> thanks
>>> ashish
>>>
>>> Blog: http://www.ashishpaliwal.com/blog
>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Mime
View raw message