mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-399) LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.
Date Sat, 31 Jul 2010 15:21:16 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894288#action_12894288
] 

Sean Owen commented on MAHOUT-399:
----------------------------------

What's the verdict here? the implementation is probably OK or this needs more study?

> LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy
problem.
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-399
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-399
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.3
>         Environment: Mac OS X 10.6.2, Hadoop 0.20.2, Mahout 0.3.
>            Reporter: Michael Lazarus
>            Priority: Critical
>         Attachments: olt.tar, Overlapping Pyramids Toy Dataset.pdf
>
>
> Hello,
> Apologies if I have not labeled this correctly.
> I have run a toy problem on Mahout 0.3 (locally) for LDA that I used to test Blei's c
version of LDA that he posts on his site. It has an exact solution that the LDA should converge
to.  Please see attached PDF that describes the intended output.
> Is LDA working?  The following output indicates some sort of collapsing behavior to me.
> T0 	T1 	T2 	T3 	T4
> x 	w 	x 	u 	x
> u 	u 	g 	j 	n
> l 	r 	i 	m 	l
> j 	q 	h 	h 	p
> v 	p 	e 	i 	q
> e 	t 	f 	g 	v
> d 	s 	d 	f 	o
> b 	c 	b 	n 	k
> y 	f 	c 	l 	m
> w 	v 	u 	v 	u
> c 	d 	p 	y 	t
> k 	o 	l 	r 	r
> i 	b 	j 	k 	j
> f 	e 	k 	e 	f
> g 	x 	y 	s 	y
> t 	y 	w 	b 	w
> h 	i 	s 	p 	s
> o 	l 	v 	x 	d
> q 	j 	t 	d 	i
> n 	k 	o 	t 	b
> The intended output is (again, please see attached):
> D 	I 	N 	S 	X
> d 	i 	n 	s 	x
> c 	h 	m 	t 	y
> e 	j 	o 	r 	w
> b 	k 	l 	u 	v
> f 	g 	p 	q 	a
> a 	f 	k 	p 	b
> g 	l 	q 	v 	u
> h 	m 	j 	w 	t
> y 	u 	r 	o 	c
> n 	s 	d 	d 	i
> s 	e 	x 	f 	f
> r 	q 	i 	i 	n
> m 	v 	w 	c 	o
> o 	w 	u 	a 	h
> q 	n 	s 	h 	g
> p 	t 	c 	x 	d
> t 	x 	f 	e 	l
> x 	d 	e 	j 	s
> w 	y 	g 	b 	j
> i 	r 	y 	n 	r
> u 	o 	h 	y 	m
> k 	b 	t 	l 	e
> v 	c 	a 	m 	k
> j 	a 	b 	g 	p
> l 	p 	v 	k 	q
> What tests do you run to make sure the output is correct?
> Thank you,
> Mike.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message