mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel.mar...@gmail.com>
Subject Re: Questions about Minhash/SimHash methods
Date Mon, 12 Jan 2015 01:12:52 GMT
The new Scala and Spark based Math DSL is what Ted was alluding to.

See http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdf
        http://mahout.apache.org/users/sparkbindings/home.html
       http://mahout.apache.org/users/sparkbindings/play-with-shell.html



On Sun, Jan 11, 2015 at 7:51 PM, 梁明强 <mqliang031197@gmail.com> wrote:

> Dear Ted Dunning,
>
> Thank you for your reply.
>
> I am a freshman in open source project, this is my first time I involved in
> open source project. So, I have no experience, and may need your
> instruction. Your reply is undoubtedly very helpful for me.
>
> As you say, I just implemented a single machine algorithm, but this is just
> the first step. Recently, I am learning Scala programming language,  in the
> next step, I will read some papers about scalable algorithm and try to
> implement it.
>
> In addition, what you mean "the new math framework" here?
>
>
> Best regards,
> Liang Mingqiang.
>
>
> 2015-01-11 23:37 GMT+09:00 Ted Dunning <ted.dunning@gmail.com>:
>
> >
> > I just looked a little bit am have a few questions.
> >
> > First, these appear to be java implementations for a single machine. How
> > scalable is that? How would it interact with the new math framework?
> >
> > Second there are a number of style issue like author tags, indentation
> and
> > such, but what I find most troubling is an almost complete lack of
> javadoc
> > and complete lack of comments about the origin of the algorithms being
> used
> > or non-trivial comments about what is happening in the code.  I see
> > comments on sections like "update w". That doesn't say anything that the
> > code doesn't say.
> >
> > Sent from my iPhone
> >
> > > On Jan 10, 2015, at 1:45, Andrew Musselman <andrew.musselman@gmail.com
> >
> > wrote:
> > >
> > > Non-negative matrix factorization would be a good addition; if you can
> > include tests with your pull request it will help.
> > >
> > > Assuming this is your PR:  https://github.com/apache/mahout/pull/70
> > >
> > > Looking forward to more.
> > >
> > >> On Jan 9, 2015, at 11:21 PM, 梁明强 <mqliang031197@gmail.com>
wrote:
> > >>
> > >> Dear sir,
> > >>
> > >> Here is Liang Mingqiang, an undergraduate student, highly interested
> in
> > Recommender System and Mahout. I have implete Non-negative Matrix
> > Factorization(NMF) and Probabilistic Matrix Factorization(PMF) method and
> > pull request my code for further comment.
> > >>
> > >> I test my code on my computer using movielens dataset and get
> > reasonable result. Do I need to write and submit a test module for my
> code.
> > Just because I need dataset for my test, can I add some text files in the
> > test package?
> > >>
> > >> In addition, Binary Matrix Factorization seems(BMF) very interesting,
> I
> > want contribute my BMF code for Mahout in the next step.
> > >>
> > >> Last, but not least, Minhash and SimHash are very popular and useful
> > methods in Recommender System. But I look through the source code of
> > Mahout, there seems no Minhash and SimHash method. Does it mean those
> > methods haven't been contributed or just because I haven't check the
> source
> > code carefully. If those two methods have benn contributed, is there
> anyone
> > willing to tell me the path. Thank you!
> > >>
> > >>
> > >> Looking forward,
> > >> ----
> > >> Liang Mingqiang
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message