spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xwei <weixiao...@gmail.com>
Subject Re: Contributing to MLlib on GLM
Date Mon, 07 Jul 2014 19:00:57 GMT
Hi Gang,

No admin is looking at our patch:( do you have some suggestions so that our
patch can get noticed by the admin?

Best regards,

Xiaokai


On Mon, Jun 30, 2014 at 8:18 PM, Gang Bai [via Apache Spark Developers
List] <ml-node+s1001551n7131h2@n3.nabble.com> wrote:

> Thanks Xiaokai,
>
> I’ve created a pull request to merge features in my PR to your repo.
> Please take a review here https://github.com/xwei-datageek/spark/pull/2 .
>
> As for GLMs, here at Sina, we are solving the problem of predicting the
> num of visitors who read a particular news article or watch an online
> sports live stream in a particular period. I’m trying to improve the
> prediction results by tuning features and incorporating new models. So I’ll
> try Gamma regression later. Thanks for the implementation.
>
> Cheers,
> -Gang
>
> On Jun 29, 2014, at 8:17 AM, xwei <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=7131&i=0>> wrote:
>
> > Hi Gang,
> >
> > No worries!
> >
> > I agree LBFGS would converge faster and your test suite is more
> comprehensive. I'd like to merge my branch with yours.
> >
> > I also agree with your viewpoint on the redundancy issue. For different
> GLMs, usually they only differ in gradient calculation but the
> ****regression.scala files are quite similar. For example,
> linearRegressionSGD, logisticRegressionSGD, RidgeRegressionSGD,
> poissonRegressionSGD all share quite a bit of common code in their class
> implementations. Since such redundancy is already there in the legacy code,
> simply merging Poisson and Gamma does not seem to help much. So I suggest
> we just leave them as separate classes for the time being.
> >
> >
> > Best regards,
> >
> > Xiaokai
> >
> > On Jun 27, 2014, at 6:45 PM, Gang Bai [via Apache Spark Developers List]
> wrote:
> >
> >> Hi Xiaokai,
> >>
> >> My bad. I didn't notice this before I created another PR for Poisson
> regression. The mails were buried in junk by the corp mail master. Also,
> thanks for considering my comments and advice in your PR.
> >>
> >> Adding my two cents here:
> >>
> >> * PoissonRegressionModel and GammaRegressionModel have the same fields
> and prediction method. Shall we use one instead of two redundant classes?
> Say, a LogLinearModel.
> >> * The LBFGS optimizer takes fewer iterations and results in better
> convergence than SGD. I implemented two GeneralizedLinearAlgorithm classes
> using LBFGS and SGD respectively. You may take a look into it. If it's OK
> to you, I'd be happy to send a PR to your branch.
> >> * In addition to the generated test data, We may use some real-world
> data for testing. In my implementation, I added the test data from
> https://onlinecourses.science.psu.edu/stat504/node/223. Please check my
> test suite.
> >>
> >> -Gang
> >> Sent from my iPad
> >>
> >>> On 2014年6月27日, at 下午6:03, "xwei" <[hidden email]> wrote:
> >>>
> >>>
> >>> Yes, that's what we did: adding two gradient functions to
> Gradient.scala and
> >>> create PoissonRegression and GammaRegression using these gradients. We
> made
> >>> a PR on this.
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7088.html
> >>> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >>
> >>
> >> If you reply to this email, your message will be added to the
> discussion below:
> >>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7107.html
> >> To unsubscribe from Contributing to MLlib on GLM, click here.
> >> NAML
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7117.html
>
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7131.html
>  To unsubscribe from Contributing to MLlib on GLM, click here
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7033&code=d2VpeGlhb2thaUBnbWFpbC5jb218NzAzM3w2NTc5NDUzMzA=>
> .
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Contributing-to-MLlib-on-GLM-tp7033p7197.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message