spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
Date Tue, 07 Aug 2018 22:15:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572410#comment-16572410
] 

Sean Owen commented on SPARK-25047:
-----------------------------------

More notes.These two SO answers shed a little light:

[https://stackoverflow.com/a/28367602/64174]

[https://stackoverflow.com/questions/28079307/unable-to-deserialize-lambda/28084460#28084460]

It suggests the problem is that the SerializedLambda instance that is deserialized should
provide a readResolve() method to, I assume, resolve it back into a scala.Function1. And that
should actually be implemented by a {{$deserializeLambda$(SerializedLambda)}} function in
the capturing class. It seems like something isn't turning it back from a SerializedLambda
to something else.

The method is in the byte code of BucketedRandomProjectionLSH and decompiles as
{code:java}
private static /* synthetic */ Object $deserializeLambda$(SerializedLambda serializedLambda)
{
    return LambdaDeserialize.bootstrap(new MethodHandle[]{$anonfun$hashDistance$1$adapted(scala.Tuple2
), $anonfun$hashFunction$2$adapted(org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
org.apache.spark.ml.linalg.Vector org.apache.spark.ml.linalg.Vector ), $anonfun$hashFunction$3$adapted(java.lang.Object
), $anonfun$hashFunction$1(org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel org.apache.spark.ml.linalg.Vector
)}, serializedLambda);

}{code}
While I traced through this for a while, I couldn't make sense of it. However, nothing actually
failed around here. The ultimate error was a bit later, and as in the StackOverflow post above.

It goes without saying that there are plenty of fields of type scala.Function1 in Spark and
this is the only problem one, and I can't see why. Is it because it involves an array type?
grepping suggests that could be unique. However I tried to create a repro in a simple class
file and all worked as expected too.

Something is odd about this case, and I don't know if it is in fact triggering some odd corner
case issue in scala or Java 8, or whether the Spark code could be tweaked to dodge it.

 

> Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25047
>                 URL: https://issues.apache.org/jira/browse/SPARK-25047
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: Sean Owen
>            Priority: Major
>
> Another distinct test failure:
> {code:java}
> - BucketedRandomProjectionLSH: streaming transform *** FAILED ***
>   org.apache.spark.sql.streaming.StreamingQueryException: Query [id = 7f34fb07-a718-4488-b644-d27cfd29ff6c,
runId = 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted due to
stage failure: Task 0 in stage 16.0 failed 1 times, most recent failure: Lost task 0.0 in
stage 16.0 (TID 16, localhost, executor driver): java.lang.ClassCastException: cannot assign
instance of java.lang.invoke.SerializedLambda to field org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction
of type scala.Function1 in instance of org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
> ...
>   Cause: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda
to field org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of type
scala.Function1 in instance of org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel
>   at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284)
> ...{code}
> Here the different nature of a Java 8 LMF closure trips of Java serialization/deserialization.
I think this can be patched by manually implementing the Java serialization here, and don't
see other instances (yet).
> Also wondering if this "val" can be a "def".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message