spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feynman Liang <fli...@databricks.com>
Subject Re: How to speed up MLlib LDA?
Date Tue, 15 Sep 2015 16:26:03 GMT
Hi Marko,

I haven't looked into your case in much detail but one immediate thought
is: have you tried the OnlineLDAOptimizer? It's implementation and
resulting LDA model (LocalLDAModel) is quite different (doesn't depend on
GraphX, assumes the model fits on a single machine) so you may see
performance differences.

Feynman

On Tue, Sep 15, 2015 at 6:37 AM, Marko Asplund <marko.asplund@gmail.com>
wrote:

>
> While doing some more testing I noticed that loading the persisted model
> from disk (~2 minutes) as well as querying LDA model topic distributions
> (~4 seconds for one document) are quite slow operations.
>
> Our application is querying LDA model topic distribution (for one doc at a
> time) as part of end-user operation execution flow, so a ~4 second
> execution time is very problematic. Am I using the MLlib LDA API correctly
> or is this just reflecting the current performance characteristics of the
> LDA implementation? My code can be found here:
>
>
> https://github.com/marko-asplund/tech-protos/blob/master/mllib-lda/src/main/scala/fi/markoa/proto/mllib/LDADemo.scala#L56-L57
>
> For what kinds of use cases are people currently using the LDA
> implementation?
>
>
> marko
>

Mime
View raw message