incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-143) CrunchInputSplit should be public
Date Tue, 15 Jan 2013 17:58:13 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554054#comment-13554054
] 

Josh Wills commented on CRUNCH-143:
-----------------------------------

It's possible right now, just hacky-- you end up doing ((MapContext) getContext()).getInputSplit()
in the DoFn, but I would be good with making information about the input data that is currently
being processed easier to access for the client. Thoughts on what the API should look like?
Very MapReduce-y, or should we wrap it in some kind of abstraction that would be valid for
(say) in-memory pipelines as well?
                
> CrunchInputSplit should be public
> ---------------------------------
>
>                 Key: CRUNCH-143
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-143
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4.0
>            Reporter: Dave Beech
>            Assignee: Josh Wills
>            Priority: Minor
>
> Similar to MAPREDUCE-2226 - it's currently not possible to access the underlying input
split details, for instance the path on HDFS. 
> Is there a nice way to make this information available from DoFn instances while keeping
with the Crunch abstraction?
> Also - MAPREDUCE-4923 might also be applicable to CrunchInputSplit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message