crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mears <>
Subject In memory PCollection for use in MRPipeline
Date Wed, 21 Jan 2015 17:01:10 GMT

I'm trying to write a Crunch job to generate a large amount of simulated
data.  To kick the job off, I need inputs into a do function.  These inputs
are essentially dummy values that will be ignored in the do fn.  To
accomplish this, I'd like to create an inmemory PCollection that can then
be passed into a MR pipeline, but if I do this with MemPipeline.collectionOf
I get an error:

Exception in thread "main" java.lang.IllegalStateException:  named
'null' cannot be serialized
	at org.apache.crunch.impl.mem.collect.MemCollection.verifySerializable(
	at org.apache.crunch.impl.mem.collect.MemCollection.parallelDo(

Is it possible to explicitly declare/instantiate a PCollection to pass
into an MRPipeline?



View raw message