mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qiaoresearcher <qiaoresearc...@gmail.com>
Subject Re: need help on mahout
Date Fri, 09 Nov 2012 21:44:16 GMT
many thanks, i may need sometime to digest the information you
provide...:-)

have a nice weekend,


On Fri, Nov 9, 2012 at 3:34 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> No SGD (stochastic gradient descent) and factorization are two different
> things. More strictly, those are two different classes of problems --
> factorization and regression. SGD is one implementation for regression
> classifcation. Factorization is finding virtual factors in a user/item
> space (ALS-WR is one of the methods to find such factors).
>
> Yes SGD is in the book but not with your example specifically since I meant
> to apply it after you find latent variables (factors, whatever).
>
> You will get more help on ALS-WR method by staying on the list and also
> perhaps create an archive entry for others to follow in a similar
> situation. The idea is that we all learn together and effectively:) (and i
> score more points for support :)
>
> CVB (if i am not totally off) is something called continuous variational
> Bayes implementation of LDA (Latent Dirichlet Allocation) which may help
> you to analyze content of your web pages IF you manage to grab the text off
> of them. in Mahout, it is facilitated by a package here:
>
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/clustering/lda/cvb/package-summary.html
> I
> don't know where exactly wiki help on CVB is, but searching mahout archive
> and stack overflow may help. Again, by staing on the list you may get more
> help with that.
>
> LSA (Latent semantic analysis) is another way to analyze the content of you
> web. See a wikipedia article for refresher, but basically it is a run of
> SVD over tf-idf of unigrams, bigrams etc. Mahout has general pipeline to
> prepare that context data with seqdirectory, seq2sparse commands (again,
> you can find details in the book). Then you just run 'mahout ssvd
> <options>' on the output of seq2sparse and use rows of U*Sigma output for
> the topical allocation values. Somebody will probably correct me on this,
> but I think you can use topical allocation values to further build your
> classification with regressions (SGD).
>
> -d
>
>
> On Fri, Nov 9, 2012 at 1:11 PM, qiaoresearcher <qiaoresearcher@gmail.com
> >wrote:
>
> > Hi Dmitriy,
> >
> > Many thanks for your comments and i really appreciate although I think I
> > may not fully understood you.
> >
> > As I understand, SGD mean stochastic gradient descent, is that right?
> > I What I need now is some example code to :  read the files, construct
> the
> > web page set, then form the vectors. Such steps are called
> 'factorization'
> > in Mahout, right?
> >
> > Do you mean Mahout in Action has examples similar to what I described?
> > what is CVB and LSA, and SSVD (singular value decomposition?)
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message