spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From loachli <...@git.apache.org>
Subject [GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...
Date Fri, 09 Jan 2015 01:06:04 GMT
Github user loachli commented on the pull request:

    https://github.com/apache/spark/pull/1290#issuecomment-69278665
  
    Hi jkbradley:
      Could you tell the jira number  related  to  ”new spark.ml package and its design
doc”
    
    
    发件人: jkbradley [mailto:notifications@github.com]
    发送时间: 2015年1月9日 3:51
    收件人: apache/spark
    抄送: Lizhengbing (bing, BIPA)
    主题: Re: [spark] [MLLIB] [spark-2352] Implementation of an Artificial Neural Network
(ANN) (#1290)
    
    
    @bgreeven<https://github.com/bgreeven> I’m not too surprised that the majority
vote (a.k.a. one vs. all) did not do very well; it does not scale well with the number of
classes. A tree (or better yet, error-corrected output codes) generally work better, in my
experience.
    
    @avulanov<https://github.com/avulanov> True, we try for consistency with APIs, except
where we’re changing the norm. There is not a clear write-up about the “norm,” although
the new spark.ml package andHc (in the JIRA) give an overview of some parts. Basically, we’re
aiming to make things more pluggable and extensible, while minimizing API change. If that
requires short-term API changes (such as switching away from ANNWithX method names), that
can be acceptable.
    
    @bgreeven<https://github.com/bgreeven> @avulanov<https://github.com/avulanov>
The test results look pretty good, though I’m not sure what to expect for accuracy. I think
the main item remaining is figuring out the public API. It’s tough since neural networks
/ deep learning are a rapidly evolving field, and there are a lot of model & algorithm
variants out there. Ideally, we could put together a design doc (to be linked from the JIRA)
for this big feature which would:
    
      *   Design a public API for neural networks and deep learning
         *   Comparison of other major libraries’ APIs
         *   Minimum viable product API for an initial PR
         *   Path for the future:
            *   What extensions might we need to do, and can we keep the public API stable
for these?
            *   What extensions might users want to do? Is the API easily extensible and/or
pluggable, or can we make it so in the future without changing the existing public API?
      *   Briefly discuss the algorithm
         *   Alg sketch, limitations, etc.
         *   Alternative algorithms, and a path for making the optimization algorithm pluggable
in the future (as we’ve discussed a bit in the PR conversation)
    
    I realize it takes quite a while to get a big new feature ready. If you’d like to encourage
early adoption, you could also post this for now as a package for Spark, while the PR is made
fully ready.
    
    CC: @mengxr<https://github.com/mengxr>
    
    —
    Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/1290#issuecomment-69237765>.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message