spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Girardot <o.girar...@lateral-thoughts.com>
Subject Re: Dataframe.fillna from 1.3.0
Date Thu, 23 Apr 2015 20:59:42 GMT
I found another way setting a SPARK_HOME on a released version and
launching an ipython to load the contexts.
I may need your insight however, I found why it hasn't been done at the
same time, this method (like some others) uses a varargs in Scala and for
now the way functions are called only one parameter is supported.

So at first I tried to just generalise the helper function "_" in the
functions.py file to multiple arguments, but py4j's handling of varargs
forces me to create an Array[Column] if the target method is expecting
varargs.

But from Python's perspective, we have no idea of whether the target method
will be expecting varargs or just multiple arguments (to un-tuple).
I can create a special case for "coalesce" or "for method that takes of
list of columns as arguments" considering they will be varargs based (and
therefore needs an Array[Column] instead of just a list of arguments)

But this seems very specific and very prone to future mistakes.
Is there any way in Py4j to know before calling it the signature of a
method ?


Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
o.girardot@lateral-thoughts.com> a écrit :

> What is the way of testing/building the pyspark part of Spark ?
>
> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
> o.girardot@lateral-thoughts.com> a écrit :
>
>> yep :) I'll open the jira when I've got the time.
>> Thanks
>>
>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin <rxin@databricks.com> a écrit :
>>
>>> Ah damn. We need to add it to the Python list. Would you like to give it
>>> a shot?
>>>
>>>
>>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>>> o.girardot@lateral-thoughts.com> wrote:
>>>
>>>> Yep no problem, but I can't seem to find the coalesce fonction in
>>>> pyspark.sql.{*, functions, types or whatever :) }
>>>>
>>>> Olivier.
>>>>
>>>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
>>>> o.girardot@lateral-thoughts.com> a écrit :
>>>>
>>>> > a UDF might be a good idea no ?
>>>> >
>>>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>>>> > o.girardot@lateral-thoughts.com> a écrit :
>>>> >
>>>> >> Hi everyone,
>>>> >> let's assume I'm stuck in 1.3.0, how can I benefit from the *fillna*
>>>> API
>>>> >> in PySpark, is there any efficient alternative to mapping the records
>>>> >> myself ?
>>>> >>
>>>> >> Regards,
>>>> >>
>>>> >> Olivier.
>>>> >>
>>>> >
>>>>
>>>
>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message