spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: createDataFrame allows column names as second param in Python not in Scala
Date Sun, 03 May 2015 05:44:14 GMT
Part of the reason is that it is really easy to just call toDF on Scala,
and we already have a lot of createDataFrame functions.

(You might find some of the cross-language differences confusing, but I'd
argue most real users just stick to one language, and developers or
trainers are the only ones that need to constantly switch between
languages).

On Sat, May 2, 2015 at 11:05 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Hi everyone,
> SQLContext.createDataFrame has different behaviour in Scala or Python :
>
> >>> l = [('Alice', 1)]
> >>> sqlContext.createDataFrame(l).collect()
> [Row(_1=u'Alice', _2=1)]
> >>> sqlContext.createDataFrame(l, ['name', 'age']).collect()
> [Row(name=u'Alice', age=1)]
>
> and in Scala :
>
> scala> val data = List(("Alice", 1), ("Wonderland", 0))
> scala> sqlContext.createDataFrame(data, List("name", "score"))
> <console>:28: error: overloaded method value createDataFrame with
> alternatives: ... cannot be applied to ...
>
> What do you think about allowing in Scala too to have a Seq of column names
> for the sake of consistency ?
>
> Regards,
>
> Olivier.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message