crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Whiting (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-412) MemPipeline mode for simulating MapReduce quirks
Date Wed, 04 Jun 2014 16:31:04 GMT


David Whiting updated CRUNCH-412:


I didn't manage to get very far in hacking into Crunch itself, but here's the (stupid) helper
we've been using so far just with individual DoFns, which could work in an integrated way
by combining with the SingleUseIterable when they are combined in the HFunction. It's kinda
specific to Avro SpecificRecords though, so I'm guessing there could be something a lot more
intelligent delegating to the PType of the collection, but I can't really figure out how that
would work.

> MemPipeline mode for simulating MapReduce quirks
> ------------------------------------------------
>                 Key: CRUNCH-412
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments:
> From a discussion on the mailing list, we'd like to have a MemPipeline mode that simulates
a couple of the quirks of MapReduce/MRPipeline for more reliable testing, namely:
> 1) Shuffle code that re-uses reduce-side objects so we can detect bugs caused by object
modification, and
> 2) Serializes/deserializes DoFns before running them in order to test for any non-serializable
code that sneaks into a pipeline.

This message was sent by Atlassian JIRA

View raw message