madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [madlib] hpandeycodeit commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity to LDA
Date Fri, 01 Nov 2019 17:25:55 GMT
hpandeycodeit commented on issue #432: MADLIB-1351 : Added stopping criteria on perplexity
to LDA
URL: https://github.com/apache/madlib/pull/432#issuecomment-548876910
 
 
   > (1)
   > Please add `num_iterations` to the output table. This is needed now because
   > we have a perplexity tolerance, so training may not run the maximum number of iterations
   > specified. The model table should look like:
   > 
   > ```
   > model_table
   > ...
   > model	BIGINT[]. The encoded model ...etc...
   > num_iterations	INTEGER. Number of iterations that training ran for,
   > which may be less than the maximum value specified in the parameter 'iter_num' if
   > the perplexity tolerance was reached.
   > perplexity	DOUBLE PRECISION[] Array of ...etc....
   > ...
   > ```
   > 
   > (2)
   > The parameter 'perplexity_tol' can be any value >= 0.0 Currently it errors out
below a
   > value of 0.1 which is not correct. I may want to set it to 0.0 so that training runs
   > for the full number of iterations. So please change it to error out if 'perplexity_tol'<0.
   > 
   > ```
   > DROP TABLE IF EXISTS lda_model_perp, lda_output_data_perp;
   > 
   > SELECT madlib.lda_train( 'documents_tf',          -- documents table in the form of
term frequency
   >                          'lda_model_perp',        -- model table created by LDA training
(not human readable)
   >                          'lda_output_data_perp',  -- readable output data table
   >                          103,                     -- vocabulary size
   >                          5,                       -- number of topics
   >                          10,                      -- number of iterations
   >                          5,                       -- Dirichlet prior for the per-doc
topic multinomial (alpha)
   >                          0.01,                    -- Dirichlet prior for the per-topic
word multinomial (beta)
   >                          2,                       -- Evaluate perplexity every 2 iterations
   >                          0.0                      -- Set tolerance to 0 so runs full
number of iterations
   >                        );
   > ```
   > 
   > produces
   > 
   > ```
   > InternalError: (psycopg2.InternalError) plpy.Error: invalid argument: perplexity_tol
should not be less than .1 (plpython.c:5038)
   > CONTEXT:  Traceback (most recent call last):
   >   PL/Python function "lda_train", line 22, in <module>
   >     voc_size, topic_num, iter_num, alpha, beta,evaluate_every , perplexity_tol)
   >   PL/Python function "lda_train", line 519, in lda_train
   >   PL/Python function "lda_train", line 96, in _assert
   > PL/Python function "lda_train"
   >  [SQL: "SELECT madlib.lda_train( 'documents_tf',          -- documents table in the
form of term frequency\n                         'lda_model_perp',        -- model table created
by LDA training (not human readable)\n                         'lda_output_data_perp',  --
readable output data table \n                         103,                     -- vocabulary
size\n                         5,                       -- number of topics\n            
            10,                      -- number of iterations\n                         5,
                      -- Dirichlet prior for the per-doc topic multinomial (alpha)\n     
                   0.01,                    -- Dirichlet prior for the per-topic word multinomial
(beta)\n                         2,                       -- Evaluate perplexity every 2 iterations\n
                        0.0                      -- Set tolerance to 0 so runs full number
of iterations\n                       );"]
   > ```
   
   This is fixed. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message