spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?
Date Fri, 09 Jan 2015 18:59:07 GMT
In the current implementation of TorrentBroadcast, the blocks are
fetched one-by-one
in single thread, so it can not fully utilize the network bandwidth.

Davies

On Fri, Jan 9, 2015 at 2:11 AM, Jun Yang <yangjunpro@gmail.com> wrote:
> Guys,
>
> I have a question regarding to Spark 1.1 broadcast implementation.
>
> In our pipeline, we have a large multi-class LR model, which is about 1GiB
> size.
> To employ the benefit of Spark parallelism, a natural thinking is to
> broadcast this model file to the worker node.
>
> However, it looks that broadcast performance is not quite good.
>
> During the process of broadcasting the model file, I just monitor the
> network card throughput of worker node, their
> recv/write throughput is just around 30~40 MiB( our server box is equipped
> with 100MiB ethernet card).
>
> Is this the real limitation of Spark 1.1 broadcast implementation? Or there
> may be some configuration or tricks
> that can help make Spark broadcast perform better.
>
> Thanks
>
>
>
> --
> yangjunpro@gmail.com
> http://hi.baidu.com/yjpro

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message