crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mears <benjaminmme...@gmail.com>
Subject In memory PCollection for use in MRPipeline
Date Wed, 21 Jan 2015 17:01:10 GMT
Hi,

I'm trying to write a Crunch job to generate a large amount of simulated
data.  To kick the job off, I need inputs into a do function.  These inputs
are essentially dummy values that will be ignored in the do fn.  To
accomplish this, I'd like to create an inmemory PCollection that can then
be passed into a MR pipeline, but if I do this with MemPipeline.collectionOf
I get an error:

Exception in thread "main" java.lang.IllegalStateException:  named
'null' cannot be serialized
	at org.apache.crunch.impl.mem.collect.MemCollection.verifySerializable(MemCollection.java:110)
	at org.apache.crunch.impl.mem.collect.MemCollection.parallelDo(MemCollection.java:129)

Is it possible to explicitly declare/instantiate a PCollection to pass
into an MRPipeline?

Thanks!

-Ben

Mime
View raw message