incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <smar...@apache.org>
Subject Re: [DISCUSS] PredictionIO incubation proposal
Date Tue, 17 May 2016 20:57:59 GMT
Thanks for having me as a mentor for PIO.  I would like to be added to the
initial list of committers and am looking to actively participate in the
development too. I am not sure if my being a mentor automatically grants me
the 'commit' karma.

Its already been suggested earlier in this thread by Roman and
Jean-Baptiste that the project needs to be decoupled from Spark and
integrated with Beam.  It would be good to reduce the reliance on
Spark-Submit from what I have seen of the project so far. But let's not
talk architecture and design here when the project's not in incubator yet.
:)




On Tue, May 17, 2016 at 4:09 PM, Henry Saputra <henry.saputra@gmail.com>
wrote:

> Cool, this will make code grant process to be easier =)
>
> The initial committers and mentors look great.
> I am sure more will come as contributions start pouring in to the project.
>
> Looking forward for the VOTE thread soon.
>
> - Henry
>
> On Mon, May 16, 2016 at 12:07 PM, Simon Chan <simon@salesforce.com> wrote:
>
> > Yes, it includes everyone who previously contributed code from
> PredictionIO
> > before the acquisition and still want to be involved in the project.
> >
> > We may have missed "Alex Merritt", going to add him to the list soon.
> >
> > Simon
> >
> >
> > On Mon, May 16, 2016 at 11:58 AM, Suneel Marthi <smarthi@apache.org>
> > wrote:
> >
> > > I do have a question about the proposed list of committers.
> > >
> > > Does the list also include all of those folks who were with
> PredictionIO
> > > (and had contributed to the project) and then chose to leave when PIO
> was
> > > acquired by Salesforce?
> > >
> > >
> > >
> > >
> > > On Mon, May 16, 2016 at 1:13 PM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> > > wrote:
> > >
> > > > By the way, we have some discussion about integrating Zeppelin with
> > Beam
> > > ;)
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > On 05/15/2016 02:32 AM, Roman Shaposhnik wrote:
> > > >
> > > >> Super excited to see this proposal! This will finally allow us to
> have
> > > >> an ASF managed
> > > >> backend for next generation data-driven apps that I see emerging
> quite
> > > >> rapidly.
> > > >>
> > > >> The proposal looks great to me (although I'd recommend calling Scala
> > > >> as an implementation
> > > >> language more prominently since it may attract additional developers
> > > >> with affinity to it).
> > > >>
> > > >> I do have two questions about technology:
> > > >>     1. do you think it would be possible to leverage Apache Beam
> > > >> (incubating)
> > > >>         for abstracting away dependency on execution frameworks? My
> > > >> understanding
> > > >>         is that PredictionIO currently only run on Spark.
> > > >>     2. is there a potential integration with Apache Zeppelin
> possible?
> > > >>
> > > >> Thanks,
> > > >> Roman.
> > > >>
> > > >> On Fri, May 13, 2016 at 1:41 PM, Andrew Purtell <
> apurtell@apache.org>
> > > >> wrote:
> > > >>
> > > >>> Greetings,
> > > >>>
> > > >>> It is my pleasure to
> > > >>>
> > > >>> propose the PredictionIO project for incubation at the Apache
> > Software
> > > >>> Foundation.
> > > >>>
> > > >>> PredictionIO is a
> > > >>> popular
> > > >>> open
> > > >>>
> > > >>> source Machine Learning Server built on top of a state-of-the-art
> > open
> > > >>> source stack, including several Apache technologies, that
> > > >>>
> > > >>> enables developers to manage and deploy production-ready predictive
> > > >>> services for various kinds of machine learning tasks
> > > >>> , with more than 400 production deployments around the world and
a
> > > >>> growing
> > > >>> contributor community.
> > > >>>
> > > >>>
> > > >>> The text of the proposal is included below and is also available
at
> > > >>> https://wiki.apache.org/incubator/PredictionIO
> > > >>>
> > > >>> Best regards,
> > > >>> Andrew Purtell
> > > >>>
> > > >>>
> > > >>> = PredictionIO Proposal =
> > > >>>
> > > >>> === Abstract ===
> > > >>> PredictionIO is an open source Machine Learning Server built on
top
> > of
> > > >>> state-of-the-art open source stack, that enables developers to
> manage
> > > and
> > > >>> deploy production-ready predictive services for various kinds
of
> > > machine
> > > >>> learning tasks.
> > > >>>
> > > >>> === Proposal ===
> > > >>> The PredictionIO platform consists of the following components:
> > > >>>
> > > >>>   * PredictionIO framework - provides the machine learning stack
> for
> > > >>>   building, evaluating and deploying engines with machine learning
> > > >>>   algorithms. It uses Apache Spark for processing.
> > > >>>
> > > >>>   * Event Server - the machine learning analytics layer for
> unifying
> > > >>> events
> > > >>>   from multiple platforms. It can use Apache HBase or any JDBC
> > backends
> > > >>>   as its data store.
> > > >>>
> > > >>> The PredictionIO community also maintains a
> > > >>>
> > > >>> Template Gallery, a place to
> > > >>> publish and download (free or proprietary) engine templates for
> > > different
> > > >>> types of machine learning applications, and is a complemental
part
> of
> > > the
> > > >>> project. At this point we exclude the Template Gallery from the
> > > proposal,
> > > >>> as it has a separate set of contributors and we’re not familiar
> with
> > an
> > > >>> Apache approved mechanism to maintain such a gallery.
> > > >>>
> > > >>> You can find the Template Gallery at
> > https://templates.prediction.io/
> > > >>>
> > > >>> === Background ===
> > > >>> PredictionIO was started with a mission to democratize and bring
> > > machine
> > > >>> learning to the masses.
> > > >>>
> > > >>> Machine learning has traditionally been a luxury for big companies
> > like
> > > >>> Google, Facebook, and Netflix. There are ML libraries and tools
> lying
> > > >>> around the internet but the effort of putting them all together
as
> a
> > > >>> production-ready infrastructure is a very resource-intensive task
> > that
> > > is
> > > >>> remotely reachable by individuals or small businesses.
> > > >>>
> > > >>> PredictionIO is a production-ready, full stack machine learning
> > system
> > > >>> that
> > > >>> allows organizations of any scale to quickly deploy machine
> learning
> > > >>> capabilities. It comes with official and community-contributed
> > machine
> > > >>> learning engine templates that are easy to customize.
> > > >>>
> > > >>> === Rationale ===
> > > >>> As usage and number of contributors to PredictionIO has grown
> bigger
> > > and
> > > >>> more diverse, we have sought for an independent framework for
the
> > > project
> > > >>> to keep thriving. We believe the Apache foundation is a great
fit.
> > > >>> Joining
> > > >>> Apache would ensure that tried and true processes and procedures
> are
> > in
> > > >>> place for the growing number of organizations interested in
> > > contributing
> > > >>> to PredictionIO. PredictionIO is also a good fit for the Apache
> > > >>> foundation.
> > > >>> PredictionIO was built on top of several Apache projects (HBase,
> > Spark,
> > > >>> Hadoop). We are familiar with the Apache process and believe that
> the
> > > >>> democratic and meritocratic nature of the foundation aligns with
> the
> > > >>> project goals.
> > > >>>
> > > >>> === Initial Goals ===
> > > >>> The initial milestones will be to move the existing codebase to
> > Apache
> > > >>> and
> > > >>> integrate with the Apache development process. Once this is
> > > accomplished,
> > > >>> we plan for incremental development and releases that follow the
> > Apache
> > > >>> guidelines, as well as growing our developer and user communities.
> > > >>>
> > > >>> === Current Status ===
> > > >>> PredictionIO has undergone nine minor releases and many patches.
> > > >>> PredictionIO is being used in production by Salesforce.com as
well
> as
> > > >>> many
> > > >>> other organizations and apps. The PredictionIO codebase is
> currently
> > > >>> hosted at GitHub, which will form the basis of the Apache git
> > > repository.
> > > >>>
> > > >>> ==== Meritocracy ====
> > > >>> We plan to invest in supporting a meritocracy. We will discuss
the
> > > >>> requirements in an open forum. We intend to invite additional
> > > developers
> > > >>> to participate. We will encourage and monitor community
> participation
> > > so
> > > >>> that privileges can be extended to those that contribute.
> > > >>>
> > > >>> ==== Community ====
> > > >>> Acceptance into the Apache foundation would bolster the already
> > strong
> > > >>> user and developer community around PredictionIO. That community
> > > includes
> > > >>> many contributors from various other companies, and an active
> mailing
> > > >>> list
> > > >>> composed of hundreds of users.
> > > >>>
> > > >>> ==== Core Developers ====
> > > >>> The core developers of our project are listed in our contributors
> and
> > > >>> initial PPMC below. Though many are employed at Salesforce.com,
> there
> > > are
> > > >>> also engineers from ActionML, and independent developers.
> > > >>>
> > > >>> === Alignment ===
> > > >>> The ASF is the natural choice to host the PredictionIO project
as
> its
> > > >>> goal
> > > >>> is democratizing Machine Learning by making it more easily
> accessible
> > > to
> > > >>> every user/developer. PredictionIO is built on top of several
top
> > level
> > > >>> Apache projects as outlined above.
> > > >>>
> > > >>> === Known Risks ===
> > > >>>
> > > >>> ==== Orphaned products ====
> > > >>> PredictionIO has a solid and growing community. It is deployed
on
> > > >>> production environments by companies of all sizes to run various
> > kinds
> > > of
> > > >>> predictive engines.
> > > >>>
> > > >>> In addition to the community contribution to PredictionIO
> framework,
> > > the
> > > >>> community is also actively contributing new engines to the Template
> > > >>> Gallery as well as SDKs and documentation for the project.
> Salesforce
> > > is
> > > >>> committed to utilize and advance the PredictionIO code base and
> > support
> > > >>> its user community.
> > > >>>
> > > >>> ==== Inexperience with Open Source ====
> > > >>> PredictionIO has existed as a healthy open source project for
> almost
> > > two
> > > >>> years and is the most starred Scala project on GitHub. All of
the
> > > >>> proposed
> > > >>> committers have contributed to ASF and Linux Foundation open source
> > > >>> projects. Several current committers on Apache projects and Apache
> > > >>> Members
> > > >>> are involved in this proposal and intend to provide mentorship.
> > > >>>
> > > >>> ==== Homogeneous Developers ====
> > > >>> The initial list of committers includes developers from several
> > > >>> institutions, including Salesforce, ActionML, Channel4, USC as
well
> > as
> > > >>> unaffiliated developers.
> > > >>>
> > > >>> ==== Reliance on Salaried Developers ====
> > > >>> Like most open source projects, PredictionIO receives substantial
> > > support
> > > >>> from salaried developers. PredictionIO development is partially
> > > supported
> > > >>> by Salesforce.com, but there are many contributors from various
> other
> > > >>> companies, and an active mailing list composed of hundreds of
> users.
> > We
> > > >>> will continue our efforts to ensure stewardship of the project
to
> be
> > > >>> independent of salaried developers by meritocratically promoting
> > those
> > > >>> contributors to committers.
> > > >>>
> > > >>> ==== Relationships with Other Apache Product ====
> > > >>> PredictionIO relies heavily on top level apache projects such
as
> > Apache
> > > >>> Spark, HBase and Hadoop. However it brings a distinguished
> > > functionality,
> > > >>> rather than just an abstraction - Machine Learning in a
> plug-and-play
> > > >>> fashion.
> > > >>>
> > > >>> Compared to Apache Mahout, which focuses on the development of
a
> wide
> > > >>> variety of algorithms, PredictionIO offers a platform to manage
the
> > > whole
> > > >>> machine learning workflow, including data collection, data
> > preparation,
> > > >>> modeling, deployment and management of predictive services in
> > > production
> > > >>> environments.
> > > >>>
> > > >>> ==== An Excessive Fascination with the Apache Brand ====
> > > >>> PredictionIO is already a widely known open source project. This
> > > proposal
> > > >>> is not for the purpose of generating publicity. Rather, the primary
> > > >>> benefits to joining Apache are those outlined in the Rationale
> > section.
> > > >>>
> > > >>> === Documentation ===
> > > >>> PredictionIO boasts rich and live documentation, included in the
> code
> > > >>> repo
> > > >>> (docs/manual directory), is built with Middleman, and publicly
> hosted
> > > at
> > > >>> https://docs.prediction.io
> > > >>>
> > > >>> === Initial Source and Intellectual Property Submission Plan ===
> > > >>> Currently, the PredictionIO codebase is distributed under the
> Apache
> > > 2.0
> > > >>> License and hosted on GitHub:
> > > >>> https://github.com/PredictionIO/PredictionIO
> > > >>>
> > > >>> === External Dependencies ===
> > > >>> PredictionIO has the following external dependencies:
> > > >>>   * Apache Hadoop 2.4.0 (optional, required only if YARN and HDFS
> are
> > > >>> needed)
> > > >>>   * Apache Spark 1.3.0 for Hadoop 2.4
> > > >>>   * Java SE Development Kit 8
> > > >>>   * and one of the following sets:
> > > >>>
> > > >>>     * PostgreSQL 9.1
> > > >>>
> > > >>>
> > > >>> or
> > > >>>
> > > >>>
> > > >>> * MySQL 5.1
> > > >>>
> > > >>>   or
> > > >>>
> > > >>>
> > > >>>   * Apache HBase 0.98.6
> > > >>>
> > > >>>
> > > >>> * Elasticsearch 1.4.0
> > > >>>
> > > >>> Upon acceptance to the incubator, we would begin a thorough
> analysis
> > of
> > > >>> all transitive dependencies to verify this information and
> introduce
> > > >>> license checking into the build and release process by integrating
> > with
> > > >>> Apache RAT.
> > > >>>
> > > >>> === Cryptography ===
> > > >>> PredictionIO does not include cryptographic code. We utilize
> standard
> > > >>> JCE and JSSE APIs provided by the Java Runtime Environment.
> > > >>>
> > > >>> === Required Resources ===
> > > >>> We request that following resources be created for the project
to
> use
> > > >>>
> > > >>> ==== Mailing lists ====
> > > >>>
> > > >>> predictionio-private@incubator.apache.org (with moderated
> > > subscriptions)
> > > >>>
> > > >>> predictionio-dev
> > > >>>
> > > >>> predictionio-user
> > > >>>
> > > >>> predictionio-commits
> > > >>>
> > > >>> We will migrate the existing PredictionIO mailing lists.
> > > >>>
> > > >>> ==== Git repository ====
> > > >>> The PredictionIO team would like to use Git for source control,
due
> > to
> > > >>> our
> > > >>> current use of GitHub.
> > > >>>
> > > >>> git://git.apache.org/incubator-predictionio
> > > >>>
> > > >>> ==== Documentation ====
> > > >>> https://predictionio.incubator.apache.org/docs/
> > > >>>
> > > >>> ==== JIRA instance ====
> > > >>> PredictionIO currently uses the GitHub issue tracking system
> > associated
> > > >>> with its repository:
> > > https://github.com/PredictionIO/PredictionIO/issues
> > > >>> .
> > > >>> We will migrate to Apache JIRA.
> > > >>>
> > > >>> JIRA PREDICTIONIO
> > > >>> https://issues.apache.org/jira/browse/PREDICTIONIO
> > > >>>
> > > >>> ==== Other Resources ====
> > > >>> * TravisCI for builds and test running.
> > > >>>
> > > >>> * PredictionIO's documentation, included in the code repo
> > (docs/manual
> > > >>> directory), is built with Middleman and publicly hosted
> > > >>> https://docs.prediction.io
> > > >>>
> > > >>> * A blog to drive adoption and excitement at
> > > https://blog.prediction.io
> > > >>>
> > > >>> === Initial Committers ===
> > > >>>
> > > >>> * Pat Ferrell
> > > >>>
> > > >>> * Tamas Jambor
> > > >>>
> > > >>> * Justin Yip
> > > >>>
> > > >>> * Xusen Yin
> > > >>>
> > > >>> * Lee Moon Soo
> > > >>>
> > > >>> * Donald Szeto
> > > >>>
> > > >>> * Kenneth Chan
> > > >>>
> > > >>> * Tom Chan
> > > >>>
> > > >>> * Simon Chan
> > > >>>
> > > >>> * Marco Vivero
> > > >>>
> > > >>> * Matthew Tovbin
> > > >>>
> > > >>> * Yevgeny Khodorkovsky
> > > >>>
> > > >>> * Felipe Oliveira
> > > >>>
> > > >>> * Vitaly Gordon
> > > >>>
> > > >>> === Affiliations ===
> > > >>>
> > > >>> * Pat Ferrell - ActionML
> > > >>>
> > > >>> * Tamas Jambor - Channel4
> > > >>>
> > > >>> * Justin Yip - independent
> > > >>>
> > > >>> * Xusen Yin - USC
> > > >>>
> > > >>> * Lee Moon Soo - NFLabs
> > > >>>
> > > >>> * Donald Szeto - Salesforce
> > > >>>
> > > >>> * Kenneth Chan - Salesforce
> > > >>>
> > > >>> * Tom Chan - Salesforce
> > > >>>
> > > >>> * Simon Chan - Salesforce
> > > >>>
> > > >>> * Marco Vivero - Salesforce
> > > >>>
> > > >>> * Matthew Tovbin - Salesforce
> > > >>>
> > > >>> * Yevgeny Khodorkovsky - Salesforce
> > > >>>
> > > >>> * Felipe Oliveira - Salesforce
> > > >>>
> > > >>> * Vitaly Gordon - Salesforce
> > > >>>
> > > >>> === Sponsors ===
> > > >>>
> > > >>> ==== Champion ====
> > > >>>
> > > >>> Andrew Purtell <apurtell at apache dot org>
> > > >>>
> > > >>> ==== Nominated Mentors ====
> > > >>>
> > > >>> * Andrew Purtell <apurtell at apache dot org>
> > > >>>
> > > >>> * James Taylor <jtaylor at apache dot org>
> > > >>>
> > > >>> * Lars Hofhansl <larsh at apache dot org>
> > > >>>
> > > >>> * Suneel Marthi <smarthi at apache dot org>
> > > >>>
> > > >>> * Xiangrui Meng <meng at apache dot org>
> > > >>>
> > > >>> * Luciano Resende <lresende at apache dot org>
> > > >>>
> > > >>> ==== Sponsoring Entity ====
> > > >>>
> > > >>> Apache Incubator PMC
> > > >>>
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > >> For additional commands, e-mail: general-help@incubator.apache.org
> > > >>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbonofre@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail: general-help@incubator.apache.org
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message