spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemant Bhanawat <hemant9...@gmail.com>
Subject Re: DataFrame First method is resulting different results in each iteration
Date Wed, 03 Feb 2016 10:58:57 GMT
Missing order by?

Hemant Bhanawat
SnappyData (http://snappydata.io/)

On Wed, Feb 3, 2016 at 3:45 PM, satish chandra j <jsatishchandra@gmail.com>
wrote:

> HI All,
> I have data in a emp_df (DataFrame) as mentioned below:
>
> EmpId   Sal   DeptNo
> 001       100   10
> 002       120   20
> 003       130   10
> 004       140   20
> 005       150   10
>
> ordrd_emp_df = emp_df.orderBy($"DeptNo",$"Sal".desc)  which results as
> below:
>
> DeptNo  Sal   EmpId
> 10         150   005
> 10         130   003
> 10         100   001
> 20         140   004
> 20         120   002
>
> Now I want to pick highest paid EmpId of each DeptNo.,hence applied agg
> First method as below
>
>
> ordrd_emp_df.groupBy("DeptNo").agg($"DeptNo",first("EmpId").as("TopSal")).select($"DeptNo",$"TopSal")
>
> Expected output is DeptNo  TopSal
>                               10        005
>                                20       004
> But my output varies for each iteration such as
>
> First Iteration results as  Dept  TopSal
>                                       10     003
>                                        20     004
>
> Secnd Iteration results as Dept  TopSal
>                                       10     005
>                                       20     004
>
> Third Iteration results as  Dept  TopSal
>                                       10     003
>                                       20     002
>
> Not sure why output varies on each iteration as no change in code and
> values in DataFrame
>
> Please let me know if any inputs on this
>
> Regards,
> Satish Chandra J
>

Mime
View raw message