spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mathieu D (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
Date Tue, 28 Mar 2017 20:27:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945892#comment-15945892
] 

Mathieu D edited comment on SPARK-20082 at 3/28/17 8:27 PM:
------------------------------------------------------------

[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel,
suported only by the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the
existing graph. But it's unclear for me how the new doc vertices should be weighted when added.
Right now for a new model, docs and terms vertices are weighted randomly, with the same total
weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights
on this side ?


was (Author: mathieude):
[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel only
for the Online optimizer.

Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the
existing graph. But it's unclear for me how the new doc vertices should be weighted when added.
Right now for a new model, docs and terms vertices are weighted randomly, with the same total
weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights
on this side ?

> Incremental update of LDA model, by adding initialModel as start point
> ----------------------------------------------------------------------
>
>                 Key: SPARK-20082
>                 URL: https://issues.apache.org/jira/browse/SPARK-20082
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it incrementally with
new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally update
an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message