spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Commented] (SPARK-18599) Add the Spectral LDA algorithm
Date Sun, 27 Nov 2016 16:49:58 GMT


Sean Owen commented on SPARK-18599:

If this is already a usable stand-alone package, does it need to be in Spark? the general
idea is for things like this to not be pushed into the project itself.

> Add the Spectral LDA algorithm
> ------------------------------
>                 Key: SPARK-18599
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Jencir Lee
>              Labels: lda
> The Spectral LDA algorithm transforms the LDA problem to an orthogonal tensor decomposition
problem. [[Anandkumar 2012]] establishes theoretical guarantee for the convergence of orthogonal
tensor decomposition. 
> This algorithm first builds 2nd-order, 3rd-order moments from the empirical word counts,
orthogonalize them and finally perform the tensor decomposition on the empirical data moments.
The whole procedure is purely linear and could leverage machine native BLAS/LAPACK libraries
(the Spark needs to be compiled with `-Pnetlib-lgpl` option).
> It achieves competitive log-perplexity vs Online Variational Inference in the shortest
time. It also has clean memory usage -- as of v2.0.0 we've experienced crash due to memory
problem with the built-in Gibbs Sampler or Online Variational Inference, but never with the
Spectral LDA algorithm. This algorithm is linearly scalable. 
> The original repo is at We refactored
for the Spark coding style and interfaces when porting over for the PR. We wrote a report
describing the algorithm in detail and listing test results at
It's going to enter our official repo soon.
> Anandkumar, Anima, et al., Tensor decompositions for learning latent variable models,

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message