spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Why dataframe can be more efficient than dataset?
Date Sat, 08 Apr 2017 19:34:50 GMT
As far as I am aware in newer Spark versions a DataFrame is the same as Dataset[Row].
In fact, performance depends on so many factors, so I am not sure such a comparison makes
sense.

> On 8. Apr 2017, at 20:15, Shiyuan <gshy2014@gmail.com> wrote:
> 
> Hi Spark-users, 
>     I came across a few sources which mentioned DataFrame can be more efficient than
Dataset.  I can understand this is true because Dataset allows functional transformation which
Catalyst cannot look into and hence cannot optimize well. But can DataFrame be more efficient
than Dataset even if we only use the relational transformation on dataset? If so, can anyone
give some explanation why  it is so? Any benchmark comparing dataset vs. dataframe?   Thank
you!
> 
> Shiyuan 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message