spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Girardot <o.girar...@lateral-thoughts.com>
Subject Re: Nested "struct" fonction call creates a compilation error in Spark SQL
Date Thu, 15 Jun 2017 20:15:51 GMT
Hi Michael,
Spark 2.0.2 - but I have a very interesting test case actually
The optimiser seems to be at fault in a way, I've joined to this email the
explain when I limit myself to 2 levels of struct mutation and when it goes
to 5.
As you can see the optimiser seems to be doing a lot more in the later case.
After further investigation, the code is not "failing" per se - spark is
trying the whole stage codegen, the compilation is failing due to the
compilation error and I think it's falling back to the "non codegen" way.

I'll try to create a simpler test case to reproduce this if I can, what do
you think ?

Regards,

Olivier.


2017-06-15 21:08 GMT+02:00 Michael Armbrust <michael@databricks.com>:

> Which version of Spark?  If its recent I'd open a JIRA.
>
> On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <
> o.girardot@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> when we create recursive calls to "struct" (up to 5 levels) for extending
>> a complex datastructure we end up with the following compilation error :
>>
>> org.codehaus.janino.JaninoRuntimeException: Code of method
>> "(I[Lscala/collection/Iterator;)V" of class
>> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator"
>> grows beyond 64 KB
>>
>> The CreateStruct code itself is properly using the ctx.splitExpression
>> command but the "end result" of the df.select( struct(struct(struct(....)
>> ))) ends up being too much.
>>
>> Should I open a JIRA or is there a workaround ?
>>
>> Regards,
>>
>> --
>> *Olivier Girardot* | AssociƩ
>> o.girardot@lateral-thoughts.com
>>
>
>


-- 
*Olivier Girardot* | AssociƩ
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

Mime
View raw message