madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [madlib] fmcquillan99 commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA
Date Fri, 01 Nov 2019 19:02:17 GMT
fmcquillan99 commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to
LDA
URL: https://github.com/apache/madlib/pull/432#issuecomment-548912324
 
 
   (6)  
   NULLs not being handled properly
   ```
   DROP TABLE IF EXISTS lda_model_perp, lda_output_data_perp;
   
   SELECT madlib.lda_train( 'documents_tf',          -- documents table in the form of term
frequency
                            'lda_model_perp',        -- model table created by LDA training
(not human readable)
                            'lda_output_data_perp',  -- readable output data table 
                            384,                     -- vocabulary size
                            5,                        -- number of topics
                            20,                      -- number of iterations
                            5,                       -- Dirichlet prior for the per-doc topic
multinomial (alpha)
                            0.01,                    -- Dirichlet prior for the per-topic
word multinomial (beta)
                            NULL,                    -- Evaluate perplexity every n iterations
                            NULL                     -- Stopping perplexity tolerance
                          );
   
   InternalError: (psycopg2.InternalError) plpy.Error: invalid argument: perplexity_tol should
not be less than 0 (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "lda_train", line 22, in <module>
       voc_size, topic_num, iter_num, alpha, beta,evaluate_every , perplexity_tol)
     PL/Python function "lda_train", line 525, in lda_train
     PL/Python function "lda_train", line 96, in _assert
   PL/Python function "lda_train"
    [SQL: "SELECT madlib.lda_train( 'documents_tf',          -- documents table in the form
of term frequency\n                         'lda_model_perp',        -- model table created
by LDA training (not human readable)\n                         'lda_output_data_perp',  --
readable output data table \n                         384,                     -- vocabulary
size\n                         5,                        -- number of topics\n           
             20,                      -- number of iterations\n                         5,
                      -- Dirichlet prior for the per-doc topic multinomial (alpha)\n     
                   0.01,                    -- Dirichlet prior for the per-topic word multinomial
(beta)\n                         NULL,                       -- Evaluate perplexity every
n iterations\n                         NULL                      -- Stopping perplexity tolerance\n
                      );"]
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message