crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <>
Subject Re: How to create in memory collection of iterable?
Date Sun, 12 Jan 2014 07:07:38 GMT
Hi Jeremy,

On Sun, Jan 12, 2014 at 4:26 AM, Jeremy Lewi <> wrote:
> I ended up just creating a PTable<String, BowtieMapping> and then invoking
> groupByKey on the table.

Good to hear you resolved it. Although a little late, I can just
confirm that that's how I would create an in-memory PCollection of
Iterables (i.e. create a PTable and then group it by key). The
underlying reason that it's (currently) awkward to construct
PCollection of iterables is because the concept of Iterables in Crunch
isn't something that can be serialized to disk or read from disk, so
there's typically no need to be able to construct a PType for it.

FWIW, when I'm writing unit tests for DoFns I usually don't even
create an in-memory PCollection, but instead call the process method
with a mocked Emitter. The biggest issue with this approach is usually
getting the DoFn correctly initialized if it has some custom
initialization logic.

- Gabriel

> On Sat, Jan 11, 2014 at 6:20 PM, Jeremy Lewi <> wrote:
>> Lets try again,
>> How do I create an in memory collection of iterable avro specific types? I
>> can't seem to figure out how to create a PType for the iterable type.
>> Here's what I'm trying:
>>     ArrayList<BowtieMapping> mappings = new ArrayList<BowtieMapping>();
>>     PCollection<Iterable<BowtieMapping>> example4 =
>>         MemPipeline.typedCollectionOf(
>>             Avros.collections(mappings.getClass()),
>>             mappings);
>> In this case BowtieMapping is the class for my avro specific type.
>> I'm trying to write a unit test for a DoFn.
>> Thanks
>> J
>> On Sat, Jan 11, 2014 at 6:16 PM, Jeremy Lewi <> wrote:
>>> Hi Crunch Users,
>>> Ho

View raw message