crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-278) Improvements to MapsideJoin code
Date Fri, 11 Oct 2013 07:18:42 GMT


Josh Wills commented on CRUNCH-278:

Yeah, that's essentially it; the difference I had in mind was that the object that you would
create that would represent the data in the root PCollection + the subsequent DoFns wouldn't
be a PCollection, it would be a ReadableSourceBundle (or something less wordy than that),
so as to not have the issue of the invalid write() calls. But the core idea (the processing
of the DoSomethingFn happening in memory in the mapper during the initialize() call) is the

> Improvements to MapsideJoin code
> --------------------------------
>                 Key: CRUNCH-278
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, MapReduce Patterns
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-278.patch
> The fact that we have special-case code in the MapsideJoinStrategy for the in-memory
and MR-based Pipeline instances has always bugged me, so I set out to eliminate the distinction
between the two impls by creating a new interface, ReadableSourceBundle<T>, that encapsulates
the MR and in-memory specific logic for doing mapside joins in order to remove the special-case
code in MapsideJoinStrategy and hopefully make other implementations that use our mapside-join
patterns much easier to test.

This message was sent by Atlassian JIRA

View raw message