spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Morozov <>
Subject Re: RDDs caching in typical machine learning use cases
Date Mon, 04 Apr 2016 20:35:34 GMT

Yes, I believe people do that. I also believe that SparkML is able to
figure out when to cache some internal RDD also. That's definitely true for
random forest algo. It doesn't harm to cache the same RDD twice, too.

But it's not clear what'd you want to know...

Be well!
Jean Morozov

On Sun, Apr 3, 2016 at 11:34 AM, Sergey <> wrote:

> Hi Spark ML experts!
> Do you use RDDs caching somewhere together with ML lib to speed up
> calculation?
> I mean typical machine learning use cases.
> Train-test split, train, evaluate, apply model.
> Sergey.

View raw message