spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-18599) Add the Spectral LDA algorithm
Date Sun, 27 Nov 2016 16:38:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-18599:
------------------------------------

    Assignee: Apache Spark

> Add the Spectral LDA algorithm
> ------------------------------
>
>                 Key: SPARK-18599
>                 URL: https://issues.apache.org/jira/browse/SPARK-18599
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Jencir Lee
>            Assignee: Apache Spark
>              Labels: lda
>
> The Spectral LDA algorithm transforms the LDA problem to an orthogonal tensor decomposition
problem. [[Anandkumar 2012]] establishes theoretical guarantee for the convergence of orthogonal
tensor decomposition. 
> This algorithm first builds 2nd-order, 3rd-order moments from the empirical word counts,
orthogonalize them and finally perform the tensor decomposition on the empirical data moments.
The whole procedure is purely linear and could leverage machine native BLAS/LAPACK libraries
(the Spark needs to be compiled with `-Pnetlib-lgpl` option).
> It achieves competitive log-perplexity vs Online Variational Inference in the shortest
time. It also has clean memory usage -- as of v2.0.0 we've experienced crash due to memory
problem with the built-in Gibbs Sampler or Online Variational Inference, but never with the
Spectral LDA algorithm. This algorithm is linearly scalable. 
> The original repo is at https://github.com/FurongHuang/SpectralLDA-TensorSpark. We refactored
for the Spark coding style and interfaces when porting over for the PR. We wrote a report
describing the algorithm in detail and listing test results at https://www.overleaf.com/read/wscdvwrjmtmw.
It's going to enter our official repo soon.
> REFERENCES
> Anandkumar, Anima, et al., Tensor decompositions for learning latent variable models,
2012, https://arxiv.org/abs/1210.7559.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message