spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Dataframe.fillna from 1.3.0
Date Fri, 24 Apr 2015 20:51:55 GMT
The changes look good to me. Jenkins is somehow not responding. Will merge
once Jenkins comes back happy.


On Fri, Apr 24, 2015 at 2:38 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> done : https://github.com/apache/spark/pull/5683 and
> https://issues.apache.org/jira/browse/SPARK-7118
> thx
>
> Le ven. 24 avr. 2015 à 07:34, Olivier Girardot <
> o.girardot@lateral-thoughts.com> a écrit :
>
>> I'll try thanks
>>
>> Le ven. 24 avr. 2015 à 00:09, Reynold Xin <rxin@databricks.com> a écrit :
>>
>>> You can do it similar to the way countDistinct is done, can't you?
>>>
>>>
>>> https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78
>>>
>>>
>>>
>>> On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot <
>>> o.girardot@lateral-thoughts.com> wrote:
>>>
>>>> I found another way setting a SPARK_HOME on a released version and
>>>> launching an ipython to load the contexts.
>>>> I may need your insight however, I found why it hasn't been done at the
>>>> same time, this method (like some others) uses a varargs in Scala and for
>>>> now the way functions are called only one parameter is supported.
>>>>
>>>> So at first I tried to just generalise the helper function "_" in the
>>>> functions.py file to multiple arguments, but py4j's handling of varargs
>>>> forces me to create an Array[Column] if the target method is expecting
>>>> varargs.
>>>>
>>>> But from Python's perspective, we have no idea of whether the target
>>>> method will be expecting varargs or just multiple arguments (to un-tuple).
>>>> I can create a special case for "coalesce" or "for method that takes of
>>>> list of columns as arguments" considering they will be varargs based (and
>>>> therefore needs an Array[Column] instead of just a list of arguments)
>>>>
>>>> But this seems very specific and very prone to future mistakes.
>>>> Is there any way in Py4j to know before calling it the signature of a
>>>> method ?
>>>>
>>>>
>>>> Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
>>>> o.girardot@lateral-thoughts.com> a écrit :
>>>>
>>>>> What is the way of testing/building the pyspark part of Spark ?
>>>>>
>>>>> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
>>>>> o.girardot@lateral-thoughts.com> a écrit :
>>>>>
>>>>>> yep :) I'll open the jira when I've got the time.
>>>>>> Thanks
>>>>>>
>>>>>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin <rxin@databricks.com>
a
>>>>>> écrit :
>>>>>>
>>>>>>> Ah damn. We need to add it to the Python list. Would you like
to
>>>>>>> give it a shot?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>>>>>>> o.girardot@lateral-thoughts.com> wrote:
>>>>>>>
>>>>>>>> Yep no problem, but I can't seem to find the coalesce fonction
in
>>>>>>>> pyspark.sql.{*, functions, types or whatever :) }
>>>>>>>>
>>>>>>>> Olivier.
>>>>>>>>
>>>>>>>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
>>>>>>>> o.girardot@lateral-thoughts.com> a écrit :
>>>>>>>>
>>>>>>>> > a UDF might be a good idea no ?
>>>>>>>> >
>>>>>>>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>>>>>>>> > o.girardot@lateral-thoughts.com> a écrit :
>>>>>>>> >
>>>>>>>> >> Hi everyone,
>>>>>>>> >> let's assume I'm stuck in 1.3.0, how can I benefit
from the
>>>>>>>> *fillna* API
>>>>>>>> >> in PySpark, is there any efficient alternative to
mapping the
>>>>>>>> records
>>>>>>>> >> myself ?
>>>>>>>> >>
>>>>>>>> >> Regards,
>>>>>>>> >>
>>>>>>>> >> Olivier.
>>>>>>>> >>
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message