spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Hryhoriev (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-20080) Spak streaming application do not throw serialisation exception in foreachRDD
Date Fri, 24 Mar 2017 14:37:41 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Hryhoriev updated SPARK-20080:
-----------------------------------
    Description: 
Step to reproduce:
1)Use SPak Streaming application.
2) inside foreachRDD of any DStream, use rdd,foreachPartition.
3) use org.slf4j.Logger. Init it or use as a filed with closure. inside foreachPartition action.

Result:
No exception throw. Foraech action not executed.
Expected result:
Caused by: java.io.NotSerializableException: ReproduceBugMain$

description:
When i try use or init org.slf4j.Logger inside foreachPartition, that extracted to trait method.

What was called in foreachRDD. 
I have found that foreachPartition method do not execute and no exception appeared.
Tested on local and yarn mode spark.

code can be found on [github|https://github.com/GrigorievNick/Spark2_1TraitLoggerSerialisationBug/tree/9da55393850df9fe19f5ff3e63b47ec2d1f67e17].
There are two main class that explain problem.

if i will run same code with batch job. I will get right exception.
{code:java}
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923)
    at TraitWithMethod$class.executeForEachpartitoin(TraitWithMethod.scala:12)
    at ReproduceBugMain$.executeForEachpartitoin(ReproduceBugMain.scala:7)
    at ReproduceBugMain$.main(ReproduceBugMain.scala:14)
    at ReproduceBugMain.main(ReproduceBugMain.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.io.NotSerializableException: ReproduceBugMain$
Serialization stack:
    - object not serializable (class: ReproduceBugMain$, value: ReproduceBugMain$@3935e9a8)
    - field (class: TraitWithMethod$$anonfun$executeForEachpartitoin$1, name: $outer, type:
interface TraitWithMethod)
    - object (class TraitWithMethod$$anonfun$executeForEachpartitoin$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
    ... 18 more
{code}

On Github can be found 2 commit. 1 initial i add link on it(this one contain sptreaming example).
and Last one with batch job example

  was:
Step to reproduce:
1)Use SPak Streaming application.
2) inside foreachRDD of any DStream, use rdd,foreachPartition.
3) use org.slf4j.Logger. Init it or use as a filed with closure. inside foreachPartition action.

Result:
No exception throw. Forech action notexecuted.
Expected result:
Caused by: java.io.NotSerializableException: ReproduceBugMain$

description:
When i try use or init org.slf4j.Logger inside foreachPartition, that extracted to trait method.

What was called in foreachRDD. 
I have found that foreachPartition method do not execute and no exception appeared.
Tested on local and yarn mode spark.

code can be found on [github|https://github.com/GrigorievNick/Spark2_1TraitLoggerSerialisationBug/tree/9da55393850df9fe19f5ff3e63b47ec2d1f67e17].
There are two main class that explain problem.

if i will run same code with batch job. I will get right exception.
{code:java}
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923)
    at TraitWithMethod$class.executeForEachpartitoin(TraitWithMethod.scala:12)
    at ReproduceBugMain$.executeForEachpartitoin(ReproduceBugMain.scala:7)
    at ReproduceBugMain$.main(ReproduceBugMain.scala:14)
    at ReproduceBugMain.main(ReproduceBugMain.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.io.NotSerializableException: ReproduceBugMain$
Serialization stack:
    - object not serializable (class: ReproduceBugMain$, value: ReproduceBugMain$@3935e9a8)
    - field (class: TraitWithMethod$$anonfun$executeForEachpartitoin$1, name: $outer, type:
interface TraitWithMethod)
    - object (class TraitWithMethod$$anonfun$executeForEachpartitoin$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
    ... 18 more
{code}

On Github can be found 2 commit. 1 initial i add link on it(this one contain sptreaming example).
and Last one with batch job example


> Spak streaming application do not throw serialisation exception in foreachRDD
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-20080
>                 URL: https://issues.apache.org/jira/browse/SPARK-20080
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams
>    Affects Versions: 2.1.0
>         Environment: local spark and yarn from big top 1.1.0 version
>            Reporter: Nick Hryhoriev
>            Priority: Minor
>
> Step to reproduce:
> 1)Use SPak Streaming application.
> 2) inside foreachRDD of any DStream, use rdd,foreachPartition.
> 3) use org.slf4j.Logger. Init it or use as a filed with closure. inside foreachPartition
action.
> Result:
> No exception throw. Foraech action not executed.
> Expected result:
> Caused by: java.io.NotSerializableException: ReproduceBugMain$
> description:
> When i try use or init org.slf4j.Logger inside foreachPartition, that extracted to trait
method. 
> What was called in foreachRDD. 
> I have found that foreachPartition method do not execute and no exception appeared.
> Tested on local and yarn mode spark.
> code can be found on [github|https://github.com/GrigorievNick/Spark2_1TraitLoggerSerialisationBug/tree/9da55393850df9fe19f5ff3e63b47ec2d1f67e17].
There are two main class that explain problem.
> if i will run same code with batch job. I will get right exception.
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Task not serializable
>     at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
>     at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
>     at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
>     at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
>     at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
>     at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923)
>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>     at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923)
>     at TraitWithMethod$class.executeForEachpartitoin(TraitWithMethod.scala:12)
>     at ReproduceBugMain$.executeForEachpartitoin(ReproduceBugMain.scala:7)
>     at ReproduceBugMain$.main(ReproduceBugMain.scala:14)
>     at ReproduceBugMain.main(ReproduceBugMain.scala)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> Caused by: java.io.NotSerializableException: ReproduceBugMain$
> Serialization stack:
>     - object not serializable (class: ReproduceBugMain$, value: ReproduceBugMain$@3935e9a8)
>     - field (class: TraitWithMethod$$anonfun$executeForEachpartitoin$1, name: $outer,
type: interface TraitWithMethod)
>     - object (class TraitWithMethod$$anonfun$executeForEachpartitoin$1, <function1>)
>     at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>     at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
>     at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
>     at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
>     ... 18 more
> {code}
> On Github can be found 2 commit. 1 initial i add link on it(this one contain sptreaming
example). and Last one with batch job example



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message