crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-339) java.io.NotSerializableException when calling PCollection.cache()
Date Wed, 05 Feb 2014 02:32:10 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891617#comment-13891617
] 

Josh Wills commented on CRUNCH-339:
-----------------------------------

The most likely cause of that error is a DoFn that contains an instance of the OryxRecommender
class that can't be serialized, not the cache call. Check out this section of the user guide
for options for handling non-serializable instances inside of DoFn classes:

http://crunch.apache.org/user-guide.html#dovsmap

> java.io.NotSerializableException when calling PCollection.cache()
> -----------------------------------------------------------------
>
>                 Key: CRUNCH-339
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-339
>             Project: Crunch
>          Issue Type: Bug
>          Components: IO
>    Affects Versions: 0.8.2
>         Environment: CDH4.4.0
>            Reporter: Sungwoo Park
>
> When I was debugging a MRPipeline, I found that calling cache() function causes java.io.NotSerializableException
> Code:
> (...)
> PCollection<Long> candidates = loadCandidate(pipeline, candidatesPath);
> candidates.cache();
> (...)
> public PCollection<Long> loadCandidate(Pipeline p, Path candidatesPath) {
> 		if(candidatesPath == null)
> 			return null;
> 		
> 		return p.read(From.textFile(candidatesPath)).parallelDo(new DoFn<String, Long>(){
> 			@Override
> 			public void process(String stringId, Emitter<Long> emitter) {
> 				emitter.emit(Long.parseLong(stringId));
> 			}
> 			
> 		}, Writables.longs());
> 	}
> Stack trace:
> Exception in thread "main" org.apache.crunch.CrunchRuntimeException: java.io.NotSerializableException:
com.coupang.recommender.ever.utils.recommender.OryxRecommender
> 	at org.apache.crunch.impl.mr.MRPipeline.plan(MRPipeline.java:104)
> 	at org.apache.crunch.impl.mr.MRPipeline.runAsync(MRPipeline.java:123)
> 	at org.apache.crunch.impl.mr.MRPipeline.run(MRPipeline.java:111)
> 	at org.apache.crunch.impl.dist.DistributedPipeline.done(DistributedPipeline.java:109)
> 	at com.coupang.recommender.ever.utils.recommender.OryxRecommender.run(OryxRecommender.java:108)
> 	at com.coupang.recommender.ever.utils.EverUtil.run(EverUtil.java:30)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at com.coupang.recommender.ever.utils.Driver.main(Driver.java:34)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.io.NotSerializableException: com.coupang.recommender.ever.utils.recommender.OryxRecommender
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1164)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)
> 	at java.util.ArrayList.writeObject(ArrayList.java:570)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:940)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1469)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)
> 	at java.util.ArrayList.writeObject(ArrayList.java:570)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:940)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1469)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)
> 	at org.apache.crunch.util.DistCache.write(DistCache.java:55)
> 	at org.apache.crunch.impl.mr.plan.JobPrototype.serialize(JobPrototype.java:242)
> 	at org.apache.crunch.impl.mr.plan.JobPrototype.build(JobPrototype.java:215)
> 	at org.apache.crunch.impl.mr.plan.JobPrototype.getCrunchJob(JobPrototype.java:134)
> 	at org.apache.crunch.impl.mr.plan.MSCRPlanner.plan(MSCRPlanner.java:165)
> 	at org.apache.crunch.impl.mr.MRPipeline.plan(MRPipeline.java:102)
> 	... 12 more



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message