mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jake Mannix (JIRA)" <>
Subject [jira] [Commented] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix
Date Sun, 02 Jun 2013 17:18:20 GMT


Jake Mannix commented on MAHOUT-1147:

Excellent, I'll look this over later tonight.
> CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix
> -----------------------------------------------------------------------------------
>                 Key: MAHOUT-1147
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.7
>         Environment: Eclipse IDE
> Java code base
> CVB0Driver Class
> setModelPaths(Job job, Path modelPath) - method
>            Reporter: Jack Pay
>            Assignee: Jake Mannix
>              Labels: bug, cvb, fix, suggestion
>             Fix For: 0.8
>         Attachments: MAHOUT-1147.patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> Problem:
> When training doc/topic model no paths for the term/topic model found (outputs null).
> These paths are set using setModelPaths in CVB0Driver.
> Reason for Problem:
> Variety of Job instances call this method. 
> The Job is passed to the method instead of the Configuration object given to the Job.
> The configuration is retrieved from the Job instance itself.
> I believe that this Configuration instance is a clone of the original.
> This is a problem as the variable MODEL_PATHS is set on the clone which is then discarded
when the given Job is complete.
> The original Configuration has no MODEL_PATHS String set and therefore returns null.
> The code stipulates that if it cannot find a model to use a new random matrix. This happens
every time as MODEL_PATHS is not set for the Configuration instance used.
> Solution:
> Do not pass the Job to the setModels method, but pass the Configuration instance passed
into the method which created the Job.
> i.e.
> change from:
> setModelPaths(Job job, Path modelPath)
> to:
> setModelPaths(Configuration conf, Path modelPath)
> And change all calling methods accordingly (obviously).
> So far what little testing I have done appears to solve this problem.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message