mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hector Yee <hector....@gmail.com>
Subject Re: Yahoo LDA
Date Wed, 20 Jul 2011 20:31:56 GMT
The top few coefficients are in lda.topToWor.txt
The rest of it is probably in lda.top


On Wed, Jul 20, 2011 at 12:52 PM, Ian Upright <ian-public@upright.net>wrote:

> Hi,
>
> This is a little off topic, but perhaps someone on this list may be able to
> comment.
>
> I'm still fairly new to LDA, and I've been playing with Yahoo's LDA
> implementation.
>
> The Yahoo code produces a file called:
>
> lda.worToTop.txt
>
> www.teddybears.com/     recreation/toys (teddy,15) (bears,15) (enjoy,2)
> (teddy,15) (bears,15) (enjoy,2) (featuring,41) (teddy,15)
> www.bearsbythesea.com/  recreation/toys (teddy,99) (bear,99) (store,81)
> (pismo,30) (beach,88) (california,24) (specialize,99) (muffy,99) (store,11)
> (complete,11) (collections,46) (checkout,84) (web,87)
>
> So this shows that teddy is in topic 15 adn in topic 99.
>
> However, what I thought I would be looking for, is a vector, whereby each
> word is defined as a set of probabilities into a particular topic.  (eg,
> with 600 topics I could have a vector that maps that word into each of
> those
> 600 topics)
>
> This vector could then be used for calculating similarity against other
> words, etc.  Is the correct idea?
>
> If so, using the Yahoo LDA output, for each unique word, I have to
> calculate
> that vector and probability myself, using the above file?  Perhaps I'm
> missing something?
>
> Thanks, Ian
>



-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message