spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Brabec (janbrabe)" <>
Subject Mutating broadcast variable from executors, any risks even if done in a thread-safe manner?
Date Tue, 12 Mar 2019 09:39:25 GMT

I have quite specific usecase. I want to use an MXNet neural-net model in a distributed fashion
to get predictions on a very large dataset. It is not possible to broadcast the model directly
because the underlying implementation is not serializable. Instead the model has to be loaded
directly at the executors. What we do at the moment (and it works), is that we broadcast a
wrapper class and the model is loaded inside to a lazy val on a first use. This is nice because
we do not need to load the model for each partition but only for each executor, thus making
the job more efficient. However, because we are updating lazy val we are mutating the broadcasted
variable on the executors in a thread-safe manner.

I understand that broadcast was meant to broadcast immutable values, but that is simply not
convenient for us. Are there any risks to what we do, are wee shooting ourselves to the foot
and is there a better way how to achieve what we want?

View raw message