spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feynman Liang <>
Subject Re: How to speed up MLlib LDA?
Date Tue, 15 Sep 2015 16:26:03 GMT
Hi Marko,

I haven't looked into your case in much detail but one immediate thought
is: have you tried the OnlineLDAOptimizer? It's implementation and
resulting LDA model (LocalLDAModel) is quite different (doesn't depend on
GraphX, assumes the model fits on a single machine) so you may see
performance differences.


On Tue, Sep 15, 2015 at 6:37 AM, Marko Asplund <>

> While doing some more testing I noticed that loading the persisted model
> from disk (~2 minutes) as well as querying LDA model topic distributions
> (~4 seconds for one document) are quite slow operations.
> Our application is querying LDA model topic distribution (for one doc at a
> time) as part of end-user operation execution flow, so a ~4 second
> execution time is very problematic. Am I using the MLlib LDA API correctly
> or is this just reflecting the current performance characteristics of the
> LDA implementation? My code can be found here:
> For what kinds of use cases are people currently using the LDA
> implementation?
> marko

View raw message