incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: [DISCUSS] PredictionIO incubation proposal
Date Fri, 20 May 2016 15:08:13 GMT
It’s great to see such interest and I’m sure the rest of the podling would agree that the
more the better. I also agree with Suneel, people who know PIO should be given a short bit
of time to get organized before we do the desired expansion. There will be lots of room to
contribute, in any case. For instance try creating a template, no better way to learn the
project.

On May 19, 2016, at 9:16 PM, Suneel Marthi <smarthi@apache.org> wrote:

I definitely have concerns about too many folks becoming initial committers
and bringing their own corporate agendas to this project.

I suggest that first we vote PIO into incubator then bring in those less
experienced with the project. We have a good start with people who have
worked on the project from several orgs. Let us get organized first and
then bring in new people.

I sincerely feel that this is getting real murky with too many cooks with
their own agendas. The lesser external integration points to PIO the better
the project would evolve.

My 2 cents.


On Thu, May 19, 2016 at 9:03 PM, Andrew Purtell <apurtell@apache.org> wrote:

> Hi Nick,
> 
> Unless there are any concerns or objections, I will add you and Mr.
> Dusenberry to the proposal as initial committers tomorrow.
> 
> Everyone,
> 
> As it seems that discussion has died down I plan to start a VOTE thread on
> this coming Monday.
> 
> Thank you for the comment and attention thus far.
> 
> 
> On Tue, May 17, 2016 at 12:58 PM, Nick Pentreath <nick.pentreath@gmail.com
>> 
> wrote:
> 
>> Hi there
>> 
>> I'm glad to see the proposal to incubate PredictionIO. In my previous
> life
>> as a startup co-founder, I kept a close eye on the project, and it would
> be
>> fantastic to see it become an Apache incubating project!
>> 
>> The folks working on Apache Spark and Apache SystemML (incubating) here
> at
>> IBM are excited about the possibilities for integrating PredictionIO and
>> SystemML (Mike Dusenberry is a committer on that project), as well
>> as further improving Spark integration (I'm a PMC member on that
> project).
>> 
>> Mike and I, together with Luciano (who is a mentor on this proposal)
> would
>> like to volunteer our services as initial committers, if that is
> agreeable.
>> 
>> Kind regards
>> Nick
>> mlnick@apache.org
>> 
>> 
>> 
>>> 
>>> ---------- Forwarded message ----------
>>> From: Andrew Purtell <apurtell@apache.org>
>>> To: "general@incubator.apache.org" <general@incubator.apache.org>
>>> Cc:
>>> Date: Fri, 13 May 2016 13:41:38 -0700
>>> Subject: [DISCUSS] PredictionIO incubation proposal
>>> Greetings,
>>> 
>>> It is my pleasure to
>>> ​ ​
>>> propose the PredictionIO project for incubation at the Apache Software
>>> Foundation.
>>> ​ ​
>>> PredictionIO is a
>>> ​ popular​
>>> open
>>> ​ ​
>>> source Machine Learning Server built on top of a state-of-the-art open
>>> source stack, including several Apache technologies, that
>>> ​ ​
>>> enables developers to manage and deploy production-ready predictive
>>> services for various kinds of machine learning tasks
>>> ​, with more than 400 production deployments around the world and a
>> growing
>>> contributor community. ​
>>> 
>>> 
>>> The text of the proposal is included below and is also available at
>>> https://wiki.apache.org/incubator/PredictionIO
>>> 
>>> Best regards,
>>> Andrew Purtell
>>> 
>>> 
>>> = PredictionIO Proposal =
>>> 
>>> === Abstract ===
>>> PredictionIO is an open source Machine Learning Server built on top of
>>> state-of-the-art open source stack, that enables developers to manage
> and
>>> deploy production-ready predictive services for various kinds of
> machine
>>> learning tasks.
>>> 
>>> === Proposal ===
>>> The PredictionIO platform consists of the following components:
>>> 
>>> * PredictionIO framework - provides the machine learning stack for
>>> building, evaluating and deploying engines with machine learning
>>> algorithms. It uses Apache Spark for processing.
>>> 
>>> * Event Server - the machine learning analytics layer for unifying
>> events
>>> from multiple platforms. It can use Apache HBase or any JDBC backends
>>> as its data store.
>>> 
>>> The PredictionIO community also maintains a
>>> ​ ​
>>> Template Gallery, a place to
>>> publish and download (free or proprietary) engine templates for
> different
>>> types of machine learning applications, and is a complemental part of
> the
>>> project. At this point we exclude the Template Gallery from the
> proposal,
>>> as it has a separate set of contributors and we’re not familiar with an
>>> Apache approved mechanism to maintain such a gallery.
>>> 
>>> You can find the Template Gallery at https://templates.prediction.io/
>>> 
>>> === Background ===
>>> PredictionIO was started with a mission to democratize and bring
> machine
>>> learning to the masses.
>>> 
>>> Machine learning has traditionally been a luxury for big companies like
>>> Google, Facebook, and Netflix. There are ML libraries and tools lying
>>> around the internet but the effort of putting them all together as a
>>> production-ready infrastructure is a very resource-intensive task that
> is
>>> remotely reachable by individuals or small businesses.
>>> 
>>> PredictionIO is a production-ready, full stack machine learning system
>> that
>>> allows organizations of any scale to quickly deploy machine learning
>>> capabilities. It comes with official and community-contributed machine
>>> learning engine templates that are easy to customize.
>>> 
>>> === Rationale ===
>>> As usage and number of contributors to PredictionIO has grown bigger
> and
>>> more diverse, we have sought for an independent framework for the
> project
>>> to keep thriving. We believe the Apache foundation is a great fit.
>> Joining
>>> Apache would ensure that tried and true processes and procedures are in
>>> place for the growing number of organizations interested in
> contributing
>>> to PredictionIO. PredictionIO is also a good fit for the Apache
>> foundation.
>>> PredictionIO was built on top of several Apache projects (HBase, Spark,
>>> Hadoop). We are familiar with the Apache process and believe that the
>>> democratic and meritocratic nature of the foundation aligns with the
>>> project goals.
>>> 
>>> === Initial Goals ===
>>> The initial milestones will be to move the existing codebase to Apache
>> and
>>> integrate with the Apache development process. Once this is
> accomplished,
>>> we plan for incremental development and releases that follow the Apache
>>> guidelines, as well as growing our developer and user communities.
>>> 
>>> === Current Status ===
>>> PredictionIO has undergone nine minor releases and many patches.
>>> PredictionIO is being used in production by Salesforce.com as well as
>> many
>>> other organizations and apps. The PredictionIO codebase is currently
>>> hosted at GitHub, which will form the basis of the Apache git
> repository.
>>> 
>>> ==== Meritocracy ====
>>> We plan to invest in supporting a meritocracy. We will discuss the
>>> requirements in an open forum. We intend to invite additional
> developers
>>> to participate. We will encourage and monitor community participation
> so
>>> that privileges can be extended to those that contribute.
>>> 
>>> ==== Community ====
>>> Acceptance into the Apache foundation would bolster the already strong
>>> user and developer community around PredictionIO. That community
> includes
>>> many contributors from various other companies, and an active mailing
>> list
>>> composed of hundreds of users.
>>> 
>>> ==== Core Developers ====
>>> The core developers of our project are listed in our contributors and
>>> initial PPMC below. Though many are employed at Salesforce.com, there
> are
>>> also engineers from ActionML, and independent developers.
>>> 
>>> === Alignment ===
>>> The ASF is the natural choice to host the PredictionIO project as its
>> goal
>>> is democratizing Machine Learning by making it more easily accessible
> to
>>> every user/developer. PredictionIO is built on top of several top level
>>> Apache projects as outlined above.
>>> 
>>> === Known Risks ===
>>> 
>>> ==== Orphaned products ====
>>> PredictionIO has a solid and growing community. It is deployed on
>>> production environments by companies of all sizes to run various kinds
> of
>>> predictive engines.
>>> 
>>> In addition to the community contribution to PredictionIO framework,
> the
>>> community is also actively contributing new engines to the Template
>>> Gallery as well as SDKs and documentation for the project. Salesforce
> is
>>> committed to utilize and advance the PredictionIO code base and support
>>> its user community.
>>> 
>>> ==== Inexperience with Open Source ====
>>> PredictionIO has existed as a healthy open source project for almost
> two
>>> years and is the most starred Scala project on GitHub. All of the
>> proposed
>>> committers have contributed to ASF and Linux Foundation open source
>>> projects. Several current committers on Apache projects and Apache
>> Members
>>> are involved in this proposal and intend to provide mentorship.
>>> 
>>> ==== Homogeneous Developers ====
>>> The initial list of committers includes developers from several
>>> institutions, including Salesforce, ActionML, Channel4, USC as well as
>>> unaffiliated developers.
>>> 
>>> ==== Reliance on Salaried Developers ====
>>> Like most open source projects, PredictionIO receives substantial
> support
>>> from salaried developers. PredictionIO development is partially
> supported
>>> by Salesforce.com, but there are many contributors from various other
>>> companies, and an active mailing list composed of hundreds of users. We
>>> will continue our efforts to ensure stewardship of the project to be
>>> independent of salaried developers by meritocratically promoting those
>>> contributors to committers.
>>> 
>>> ==== Relationships with Other Apache Product ====
>>> PredictionIO relies heavily on top level apache projects such as Apache
>>> Spark, HBase and Hadoop. However it brings a distinguished
> functionality,
>>> rather than just an abstraction - Machine Learning in a plug-and-play
>>> fashion.
>>> 
>>> Compared to Apache Mahout, which focuses on the development of a wide
>>> variety of algorithms, PredictionIO offers a platform to manage the
> whole
>>> machine learning workflow, including data collection, data preparation,
>>> modeling, deployment and management of predictive services in
> production
>>> environments.
>>> 
>>> ==== An Excessive Fascination with the Apache Brand ====
>>> PredictionIO is already a widely known open source project. This
> proposal
>>> is not for the purpose of generating publicity. Rather, the primary
>>> benefits to joining Apache are those outlined in the Rationale section.
>>> 
>>> === Documentation ===
>>> PredictionIO boasts rich and live documentation, included in the code
>> repo
>>> (docs/manual directory), is built with Middleman, and publicly hosted
> at
>>> https://docs.prediction.io
>>> 
>>> === Initial Source and Intellectual Property Submission Plan ===
>>> Currently, the PredictionIO codebase is distributed under the Apache
> 2.0
>>> License and hosted on GitHub:
>> https://github.com/PredictionIO/PredictionIO
>>> 
>>> === External Dependencies ===
>>> PredictionIO has the following external dependencies:
>>> * Apache Hadoop 2.4.0 (optional, required only if YARN and HDFS are
>>> needed)
>>> * Apache Spark 1.3.0 for Hadoop 2.4
>>> * Java SE Development Kit 8
>>> * and one of the following sets:
>>> ​  ​
>>>   * PostgreSQL 9.1
>>> 
>>> ​  ​
>>> or
>>> 
>>> ​  ​
>>> * MySQL 5.1
>>> ​  ​
>>> or
>>> 
>>> ​  ​
>>> * Apache HBase 0.98.6
>>> 
>>> ​  ​
>>> * Elasticsearch 1.4.0
>>> 
>>> Upon acceptance to the incubator, we would begin a thorough analysis of
>>> all transitive dependencies to verify this information and introduce
>>> license checking into the build and release process by integrating with
>>> Apache RAT.
>>> 
>>> === Cryptography ===
>>> PredictionIO does not include cryptographic code. We utilize standard
>>> JCE and JSSE APIs provided by the Java Runtime Environment.
>>> 
>>> === Required Resources ===
>>> We request that following resources be created for the project to use
>>> 
>>> ==== Mailing lists ====
>>> 
>>> predictionio-private@incubator.apache.org (with moderated
> subscriptions)
>>> 
>>> predictionio-dev
>>> 
>>> predictionio-user
>>> 
>>> predictionio-commits
>>> 
>>> We will migrate the existing PredictionIO mailing lists.
>>> 
>>> ==== Git repository ====
>>> The PredictionIO team would like to use Git for source control, due to
>> our
>>> current use of GitHub.
>>> 
>>> git://git.apache.org/incubator-predictionio
>>> 
>>> ==== Documentation ====
>>> https://predictionio.incubator.apache.org/docs/
>>> 
>>> ==== JIRA instance ====
>>> PredictionIO currently uses the GitHub issue tracking system associated
>>> with its repository:
> https://github.com/PredictionIO/PredictionIO/issues
>> .
>>> We will migrate to Apache JIRA.
>>> 
>>> JIRA PREDICTIONIO
>>> https://issues.apache.org/jira/browse/PREDICTIONIO
>>> 
>>> ==== Other Resources ====
>>> * TravisCI for builds and test running.
>>> 
>>> * PredictionIO's documentation, included in the code repo (docs/manual
>>> directory), is built with Middleman and publicly hosted
>>> https://docs.prediction.io
>>> 
>>> * A blog to drive adoption and excitement at
> https://blog.prediction.io
>>> 
>>> === Initial Committers ===
>>> 
>>> * Pat Ferrell
>>> 
>>> * Tamas Jambor
>>> 
>>> * Justin Yip
>>> 
>>> * Xusen Yin
>>> 
>>> * Lee Moon Soo
>>> 
>>> * Donald Szeto
>>> 
>>> * Kenneth Chan
>>> 
>>> * Tom Chan
>>> 
>>> * Simon Chan
>>> 
>>> * Marco Vivero
>>> 
>>> * Matthew Tovbin
>>> 
>>> * Yevgeny Khodorkovsky
>>> 
>>> * Felipe Oliveira
>>> 
>>> * Vitaly Gordon
>>> 
>>> === Affiliations ===
>>> 
>>> * Pat Ferrell - ActionML
>>> 
>>> * Tamas Jambor - Channel4
>>> 
>>> * Justin Yip - independent
>>> 
>>> * Xusen Yin - USC
>>> 
>>> * Lee Moon Soo - NFLabs
>>> 
>>> * Donald Szeto - Salesforce
>>> 
>>> * Kenneth Chan - Salesforce
>>> 
>>> * Tom Chan - Salesforce
>>> 
>>> * Simon Chan - Salesforce
>>> 
>>> * Marco Vivero - Salesforce
>>> 
>>> * Matthew Tovbin - Salesforce
>>> 
>>> * Yevgeny Khodorkovsky - Salesforce
>>> 
>>> * Felipe Oliveira - Salesforce
>>> 
>>> * Vitaly Gordon - Salesforce
>>> 
>>> === Sponsors ===
>>> 
>>> ==== Champion ====
>>> 
>>> Andrew Purtell <apurtell at apache dot org>
>>> 
>>> ==== Nominated Mentors ====
>>> 
>>> * Andrew Purtell <apurtell at apache dot org>
>>> 
>>> * James Taylor <jtaylor at apache dot org>
>>> 
>>> * Lars Hofhansl <larsh at apache dot org>
>>> 
>>> * Suneel Marthi <smarthi at apache dot org>
>>> 
>>> * Xiangrui Meng <meng at apache dot org>
>>> 
>>> * Luciano Resende <lresende at apache dot org>
>>> 
>>> ==== Sponsoring Entity ====
>>> 
>>> Apache Incubator PMC
>>> 
>> 
> 
> 
> 
> --
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message