spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: getting encoder implicits to be more accurate
Date Wed, 26 Oct 2016 21:11:57 GMT
if kryo could transparently be used for subtrees without narrowing the
implicit that would be great

On Wed, Oct 26, 2016 at 5:10 PM, Koert Kuipers <koert@tresata.com> wrote:

> i use kryo for the whole thing currently
>
> it would be better to use it for the subtree
>
> On Wed, Oct 26, 2016 at 5:06 PM, Michael Armbrust <michael@databricks.com>
> wrote:
>
>> You use kryo encoder for the whole thing?  Or just the subtree that we
>> don't have specific encoders for?
>>
>> Also, I'm saying I like the idea of having a kryo fallback.  I don't see
>> the point of narrowing the the definition of the implicit.
>>
>> On Wed, Oct 26, 2016 at 1:07 PM, Koert Kuipers <koert@tresata.com> wrote:
>>
>>> for example (the log shows when it creates a kryo encoder):
>>>
>>> scala> implicitly[EncoderEvidence[Option[Seq[String]]]].encoder
>>> res5: org.apache.spark.sql.Encoder[Option[Seq[String]]] =
>>> class[value[0]: array<string>]
>>>
>>> scala> implicitly[EncoderEvidence[Option[Set[String]]]].encoder
>>> dataframe.EncoderEvidence$: using kryo encoder for
>>> scala.Option[Set[String]]
>>> res6: org.apache.spark.sql.Encoder[Option[Set[String]]] =
>>> class[value[0]: binary]
>>>
>>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 4:00 PM, Koert Kuipers <koert@tresata.com>
>>> wrote:
>>>
>>>> why would generating implicits for ProductN where you also require the
>>>> elements in the Product to have an expression encoder not work?
>>>>
>>>> we do this. and then we have a generic fallback where it produces a
>>>> kryo encoder.
>>>>
>>>> for us the result is that say an implicit for Seq[(Int, Seq[(String,
>>>> Int)])] will create a new ExpressionEncoder(), while an implicit for
>>>> Seq[(Int, Set[(String, Int)])] produces a Encoders.kryoEncoder()
>>>>
>>>> On Wed, Oct 26, 2016 at 3:50 PM, Michael Armbrust <
>>>> michael@databricks.com> wrote:
>>>>
>>>>> Sorry, I realize that set is only one example here, but I don't think
>>>>> that making the type of the implicit more narrow to include only ProductN
>>>>> or something eliminates the issue.  Even with that change, we will fail
to
>>>>> generate an encoder with the same error if you, for example, have a field
>>>>> of your case class that is an unsupported type.
>>>>>
>>>>> Short of changing this to compile-time macros, I think we are stuck
>>>>> with this class of errors at runtime.  The simplest solution seems to
be to
>>>>> expand the set of thing we can handle as much as possible and allow users
>>>>> to turn on a kryo fallback for expression encoders.  I'd be hesitant
to
>>>>> make this the default though, as behavior would change with each release
>>>>> that adds support for more types.  I would be very supportive of making
>>>>> this fallback a built-in option though.
>>>>>
>>>>> On Wed, Oct 26, 2016 at 11:47 AM, Koert Kuipers <koert@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> yup, it doesnt really solve the underlying issue.
>>>>>>
>>>>>> we fixed it internally by having our own typeclass that produces
>>>>>> encoders and that does check the contents of the products, but we
did this
>>>>>> by simply supporting Tuple1 - Tuple22 and Option explicitly, and
not
>>>>>> supporting Product, since we dont have a need for case classes
>>>>>>
>>>>>> if case classes extended ProductN (which they will i think in scala
>>>>>> 2.12?) then we could drop Product and support Product1 - Product22
and
>>>>>> Option explicitly while checking the classes they contain. that would
be
>>>>>> the cleanest.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 26, 2016 at 2:33 PM, Ryan Blue <rblue@netflix.com>
wrote:
>>>>>>
>>>>>>> Isn't the problem that Option is a Product and the class it contains
>>>>>>> isn't checked? Adding support for Set fixes the example, but
the problem
>>>>>>> would happen with any class there isn't an encoder for, right?
>>>>>>>
>>>>>>> On Wed, Oct 26, 2016 at 11:18 AM, Michael Armbrust <
>>>>>>> michael@databricks.com> wrote:
>>>>>>>
>>>>>>>> Hmm, that is unfortunate.  Maybe the best solution is to
add
>>>>>>>> support for sets?  I don't think that would be super hard.
>>>>>>>>
>>>>>>>> On Tue, Oct 25, 2016 at 8:52 PM, Koert Kuipers <koert@tresata.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> i am trying to use encoders as a typeclass where if it
fails to
>>>>>>>>> find an ExpressionEncoder it falls back to KryoEncoder.
>>>>>>>>>
>>>>>>>>> the issue seems to be that ExpressionEncoder claims a
little more
>>>>>>>>> than it can handle here:
>>>>>>>>>   implicit def newProductEncoder[T <: Product : TypeTag]:
>>>>>>>>> Encoder[T] = Encoders.product[T]
>>>>>>>>>
>>>>>>>>> this "claims" to handle for example Option[Set[Int]],
but it
>>>>>>>>> really cannot handle Set so it leads to a runtime exception.
>>>>>>>>>
>>>>>>>>> would it be useful to make this a little more specific?
i guess
>>>>>>>>> the challenge is going to be case classes which unfortunately
dont extend
>>>>>>>>> Product1, Product2, etc.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Software Engineer
>>>>>>> Netflix
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message