spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vincent gromakowski <vincent.gromakow...@gmail.com>
Subject Re: Performance of Spark when the compute and storage are separated
Date Sat, 14 Apr 2018 20:06:18 GMT
Not with hadoop but with Cassandra, i have seen 20x data locality
improvement on partitioned optimized spark jobs

Le sam. 14 avr. 2018 à 21:17, Mich Talebzadeh <mich.talebzadeh@gmail.com> a
écrit :

> Hi,
>
> This is a sort of your mileage varies type question.
>
> In a classic Hadoop cluster, one has data locality when each node includes
> the Spark libraries and HDFS data. this helps certain queries like
> interactive BI.
>
> However running Spark over remote storage say Isilon scaled out NAS
> instead of LOCAL HDFS becomes problematic. The full-scan Spark needs to
> do will take much longer when it is done over the network (access the
> remote Isilon storage) instead of local I/O request to HDFS.
>
> Has anyone done some comparative studies on this?
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Mime
View raw message