madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [madlib] hpandeycodeit commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA
Date Fri, 01 Nov 2019 17:25:32 GMT
hpandeycodeit commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity
to LDA
URL: https://github.com/apache/madlib/pull/432#issuecomment-548876785
 
 
   @fmcquillan99, 
   
   In `lda_predict ` although the model table remains the same, it randomly initializes the
output table. That is why we are seeing the difference in the perplexity values from what
is calculated in `lda_train` vs `get_perplexity()`
   
   However, if the same output table(generated by `lda_train`) is passed to the `get_perplexity()`
function, the perplexity values match. For eg: 
   
   ```
   DROP TABLE IF EXISTS lda_model_perp, lda_output_data_perp;
   
   SELECT madlib.lda_train( 'documents_tf',          -- documents table in the form of term
frequency
                            'lda_model_perp',        -- model table created by LDA training
(not human readable)
                            'lda_output_data_perp',  -- readable output data table 
                            385,                     -- vocabulary size
                            5,                        -- number of topics
                            10,                      -- number of iterations
                            5,                       -- Dirichlet prior for the per-doc topic
multinomial (alpha)
                            0.01,                    -- Dirichlet prior for the per-topic
word multinomial (beta)
                            1,                       -- Evaluate perplexity every n iterations
                            .2                      -- Stopping perplexity tolerance
                          );
   
   ```
   
   Generates the following perplexity values with the last perplexity value **179.380131412**:

   
   ```
   postgres=# select perplexity from lda_model_perp ;
                                                                     perplexity          
                                                       
   ----------------------------------------------------------------------------------------------------------------------------------------------
    {196.940707618,193.245742228,191.155602156,185.314159394,182.901929923,187.283749958,186.944341124,185.508311039,185.72038473,179.380131412}
   (1 row)
   
   ```
   
   Now running the `get_perplexity()` on the above-generated output table  `lda_output_data_perp`
produces the following perplexity:
   
   ```
   postgres=# SELECT madlib.lda_get_perplexity( 'lda_model_perp',
   postgres(#                                   'lda_output_data_perp'
   postgres(#                                 );
    lda_get_perplexity 
   --------------------
      179.380131412469
   ```
   
   which matches the last perplexity value calculated by `lda_train`
   
   Thanks! 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message