spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evert Lammerts <evert.lamme...@gmail.com>
Subject Re: selecting columns with the same name in a join
Date Sun, 13 Sep 2015 10:01:46 GMT
Thanks Michael, we'll update then.

Evert
On Sep 11, 2015 20:59, "Michael Armbrust" <michael@databricks.com> wrote:

> Here is what I get on branch-1.5:
>
> x = sc.parallelize([dict(k=1, v="Evert"), dict(k=2, v="Erik")]).toDF()
> y = sc.parallelize([dict(k=1, v="Ruud"), dict(k=3, v="Vincent")]).toDF()
> x.registerTempTable('x')
> y.registerTempTable('y')
> sqlContext.sql("select y.v, x.v FROM x INNER JOIN y ON x.k=y.k").collect()
>
> Out[1]: [Row(v=u'Ruud', v=u'Evert')]
>
> On Fri, Sep 11, 2015 at 3:14 AM, Evert Lammerts <evert.lammerts@gmail.com>
> wrote:
>
>> Am I overlooking something? This doesn't seem right:
>>
>> x = sc.parallelize([dict(k=1, v="Evert"), dict(k=2, v="Erik")]).toDF()
>> y = sc.parallelize([dict(k=1, v="Ruud"), dict(k=3, v="Vincent")]).toDF()
>> x.registerTempTable('x')
>> y.registerTempTable('y')
>> sqlContext.sql("select y.v, x.v FROM x INNER JOIN y ON x.k=y.k").collect()
>>
>> Out[26]: [Row(v=u'Evert', v=u'Evert')]
>>
>> May just be because I'm behind; I'm on:
>>
>> Spark 1.5.0-SNAPSHOT (git revision 27ef854) built for Hadoop 2.6.0 Build
>> flags: -Pyarn -Psparkr -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive
>> -Phive-thriftserver -DskipTests
>>
>> Can somebody check whether the above code does work on the latest release?
>>
>> Thanks!
>> Evert
>>
>
>

Mime
View raw message