incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <smar...@apache.org>
Subject Re: [DISCUSS] PredictionIO incubation proposal
Date Mon, 16 May 2016 17:28:24 GMT
+1 to integrating with Beam



On Mon, May 16, 2016 at 1:13 PM, Jean-Baptiste Onofré <jb@nanthrax.net>
wrote:

> Hi,
>
> I second Roman here.
>
> Using Beam to abstract the execution environment would provide a very
> flexible architecture for PredictionIO.
>
> It would benefit for both projects.
>
> Regards
> JB
>
> On 05/15/2016 02:32 AM, Roman Shaposhnik wrote:
>
>> Super excited to see this proposal! This will finally allow us to have
>> an ASF managed
>> backend for next generation data-driven apps that I see emerging quite
>> rapidly.
>>
>> The proposal looks great to me (although I'd recommend calling Scala
>> as an implementation
>> language more prominently since it may attract additional developers
>> with affinity to it).
>>
>> I do have two questions about technology:
>>     1. do you think it would be possible to leverage Apache Beam
>> (incubating)
>>         for abstracting away dependency on execution frameworks? My
>> understanding
>>         is that PredictionIO currently only run on Spark.
>>     2. is there a potential integration with Apache Zeppelin possible?
>>
>> Thanks,
>> Roman.
>>
>>
>> On Fri, May 13, 2016 at 1:41 PM, Andrew Purtell <apurtell@apache.org>
>> wrote:
>>
>>> Greetings,
>>>
>>> It is my pleasure to
>>>
>>> propose the PredictionIO project for incubation at the Apache Software
>>> Foundation.
>>>
>>> PredictionIO is a
>>> popular
>>> open
>>>
>>> source Machine Learning Server built on top of a state-of-the-art open
>>> source stack, including several Apache technologies, that
>>>
>>> enables developers to manage and deploy production-ready predictive
>>> services for various kinds of machine learning tasks
>>> , with more than 400 production deployments around the world and a
>>> growing
>>> contributor community.
>>>
>>>
>>> The text of the proposal is included below and is also available at
>>> https://wiki.apache.org/incubator/PredictionIO
>>>
>>> Best regards,
>>> Andrew Purtell
>>>
>>>
>>> = PredictionIO Proposal =
>>>
>>> === Abstract ===
>>> PredictionIO is an open source Machine Learning Server built on top of
>>> state-of-the-art open source stack, that enables developers to manage and
>>> deploy production-ready predictive services for various kinds of machine
>>> learning tasks.
>>>
>>> === Proposal ===
>>> The PredictionIO platform consists of the following components:
>>>
>>>   * PredictionIO framework - provides the machine learning stack for
>>>   building, evaluating and deploying engines with machine learning
>>>   algorithms. It uses Apache Spark for processing.
>>>
>>>   * Event Server - the machine learning analytics layer for unifying
>>> events
>>>   from multiple platforms. It can use Apache HBase or any JDBC backends
>>>   as its data store.
>>>
>>> The PredictionIO community also maintains a
>>>
>>> Template Gallery, a place to
>>> publish and download (free or proprietary) engine templates for different
>>> types of machine learning applications, and is a complemental part of the
>>> project. At this point we exclude the Template Gallery from the proposal,
>>> as it has a separate set of contributors and we’re not familiar with an
>>> Apache approved mechanism to maintain such a gallery.
>>>
>>> You can find the Template Gallery at https://templates.prediction.io/
>>>
>>> === Background ===
>>> PredictionIO was started with a mission to democratize and bring machine
>>> learning to the masses.
>>>
>>> Machine learning has traditionally been a luxury for big companies like
>>> Google, Facebook, and Netflix. There are ML libraries and tools lying
>>> around the internet but the effort of putting them all together as a
>>> production-ready infrastructure is a very resource-intensive task that is
>>> remotely reachable by individuals or small businesses.
>>>
>>> PredictionIO is a production-ready, full stack machine learning system
>>> that
>>> allows organizations of any scale to quickly deploy machine learning
>>> capabilities. It comes with official and community-contributed machine
>>> learning engine templates that are easy to customize.
>>>
>>> === Rationale ===
>>> As usage and number of contributors to PredictionIO has grown bigger and
>>> more diverse, we have sought for an independent framework for the project
>>> to keep thriving. We believe the Apache foundation is a great fit.
>>> Joining
>>> Apache would ensure that tried and true processes and procedures are in
>>> place for the growing number of organizations interested in contributing
>>> to PredictionIO. PredictionIO is also a good fit for the Apache
>>> foundation.
>>> PredictionIO was built on top of several Apache projects (HBase, Spark,
>>> Hadoop). We are familiar with the Apache process and believe that the
>>> democratic and meritocratic nature of the foundation aligns with the
>>> project goals.
>>>
>>> === Initial Goals ===
>>> The initial milestones will be to move the existing codebase to Apache
>>> and
>>> integrate with the Apache development process. Once this is accomplished,
>>> we plan for incremental development and releases that follow the Apache
>>> guidelines, as well as growing our developer and user communities.
>>>
>>> === Current Status ===
>>> PredictionIO has undergone nine minor releases and many patches.
>>> PredictionIO is being used in production by Salesforce.com as well as
>>> many
>>> other organizations and apps. The PredictionIO codebase is currently
>>> hosted at GitHub, which will form the basis of the Apache git repository.
>>>
>>> ==== Meritocracy ====
>>> We plan to invest in supporting a meritocracy. We will discuss the
>>> requirements in an open forum. We intend to invite additional developers
>>> to participate. We will encourage and monitor community participation so
>>> that privileges can be extended to those that contribute.
>>>
>>> ==== Community ====
>>> Acceptance into the Apache foundation would bolster the already strong
>>> user and developer community around PredictionIO. That community includes
>>> many contributors from various other companies, and an active mailing
>>> list
>>> composed of hundreds of users.
>>>
>>> ==== Core Developers ====
>>> The core developers of our project are listed in our contributors and
>>> initial PPMC below. Though many are employed at Salesforce.com, there are
>>> also engineers from ActionML, and independent developers.
>>>
>>> === Alignment ===
>>> The ASF is the natural choice to host the PredictionIO project as its
>>> goal
>>> is democratizing Machine Learning by making it more easily accessible to
>>> every user/developer. PredictionIO is built on top of several top level
>>> Apache projects as outlined above.
>>>
>>> === Known Risks ===
>>>
>>> ==== Orphaned products ====
>>> PredictionIO has a solid and growing community. It is deployed on
>>> production environments by companies of all sizes to run various kinds of
>>> predictive engines.
>>>
>>> In addition to the community contribution to PredictionIO framework, the
>>> community is also actively contributing new engines to the Template
>>> Gallery as well as SDKs and documentation for the project. Salesforce is
>>> committed to utilize and advance the PredictionIO code base and support
>>> its user community.
>>>
>>> ==== Inexperience with Open Source ====
>>> PredictionIO has existed as a healthy open source project for almost two
>>> years and is the most starred Scala project on GitHub. All of the
>>> proposed
>>> committers have contributed to ASF and Linux Foundation open source
>>> projects. Several current committers on Apache projects and Apache
>>> Members
>>> are involved in this proposal and intend to provide mentorship.
>>>
>>> ==== Homogeneous Developers ====
>>> The initial list of committers includes developers from several
>>> institutions, including Salesforce, ActionML, Channel4, USC as well as
>>> unaffiliated developers.
>>>
>>> ==== Reliance on Salaried Developers ====
>>> Like most open source projects, PredictionIO receives substantial support
>>> from salaried developers. PredictionIO development is partially supported
>>> by Salesforce.com, but there are many contributors from various other
>>> companies, and an active mailing list composed of hundreds of users. We
>>> will continue our efforts to ensure stewardship of the project to be
>>> independent of salaried developers by meritocratically promoting those
>>> contributors to committers.
>>>
>>> ==== Relationships with Other Apache Product ====
>>> PredictionIO relies heavily on top level apache projects such as Apache
>>> Spark, HBase and Hadoop. However it brings a distinguished functionality,
>>> rather than just an abstraction - Machine Learning in a plug-and-play
>>> fashion.
>>>
>>> Compared to Apache Mahout, which focuses on the development of a wide
>>> variety of algorithms, PredictionIO offers a platform to manage the whole
>>> machine learning workflow, including data collection, data preparation,
>>> modeling, deployment and management of predictive services in production
>>> environments.
>>>
>>> ==== An Excessive Fascination with the Apache Brand ====
>>> PredictionIO is already a widely known open source project. This proposal
>>> is not for the purpose of generating publicity. Rather, the primary
>>> benefits to joining Apache are those outlined in the Rationale section.
>>>
>>> === Documentation ===
>>> PredictionIO boasts rich and live documentation, included in the code
>>> repo
>>> (docs/manual directory), is built with Middleman, and publicly hosted at
>>> https://docs.prediction.io
>>>
>>> === Initial Source and Intellectual Property Submission Plan ===
>>> Currently, the PredictionIO codebase is distributed under the Apache 2.0
>>> License and hosted on GitHub:
>>> https://github.com/PredictionIO/PredictionIO
>>>
>>> === External Dependencies ===
>>> PredictionIO has the following external dependencies:
>>>   * Apache Hadoop 2.4.0 (optional, required only if YARN and HDFS are
>>> needed)
>>>   * Apache Spark 1.3.0 for Hadoop 2.4
>>>   * Java SE Development Kit 8
>>>   * and one of the following sets:
>>>
>>>     * PostgreSQL 9.1
>>>
>>>
>>> or
>>>
>>>
>>> * MySQL 5.1
>>>
>>>   or
>>>
>>>
>>>   * Apache HBase 0.98.6
>>>
>>>
>>> * Elasticsearch 1.4.0
>>>
>>> Upon acceptance to the incubator, we would begin a thorough analysis of
>>> all transitive dependencies to verify this information and introduce
>>> license checking into the build and release process by integrating with
>>> Apache RAT.
>>>
>>> === Cryptography ===
>>> PredictionIO does not include cryptographic code. We utilize standard
>>> JCE and JSSE APIs provided by the Java Runtime Environment.
>>>
>>> === Required Resources ===
>>> We request that following resources be created for the project to use
>>>
>>> ==== Mailing lists ====
>>>
>>> predictionio-private@incubator.apache.org (with moderated subscriptions)
>>>
>>>
>>> predictionio-dev
>>>
>>> predictionio-user
>>>
>>> predictionio-commits
>>>
>>> We will migrate the existing PredictionIO mailing lists.
>>>
>>> ==== Git repository ====
>>> The PredictionIO team would like to use Git for source control, due to
>>> our
>>> current use of GitHub.
>>>
>>> git://git.apache.org/incubator-predictionio
>>>
>>> ==== Documentation ====
>>> https://predictionio.incubator.apache.org/docs/
>>>
>>> ==== JIRA instance ====
>>> PredictionIO currently uses the GitHub issue tracking system associated
>>> with its repository: https://github.com/PredictionIO/PredictionIO/issues
>>> .
>>> We will migrate to Apache JIRA.
>>>
>>> JIRA PREDICTIONIO
>>> https://issues.apache.org/jira/browse/PREDICTIONIO
>>>
>>> ==== Other Resources ====
>>> * TravisCI for builds and test running.
>>>
>>> * PredictionIO's documentation, included in the code repo (docs/manual
>>> directory), is built with Middleman and publicly hosted
>>> https://docs.prediction.io
>>>
>>> * A blog to drive adoption and excitement at https://blog.prediction.io
>>>
>>> === Initial Committers ===
>>>
>>> * Pat Ferrell
>>>
>>> * Tamas Jambor
>>>
>>> * Justin Yip
>>>
>>> * Xusen Yin
>>>
>>> * Lee Moon Soo
>>>
>>> * Donald Szeto
>>>
>>> * Kenneth Chan
>>>
>>> * Tom Chan
>>>
>>> * Simon Chan
>>>
>>> * Marco Vivero
>>>
>>> * Matthew Tovbin
>>>
>>> * Yevgeny Khodorkovsky
>>>
>>> * Felipe Oliveira
>>>
>>> * Vitaly Gordon
>>>
>>> === Affiliations ===
>>>
>>> * Pat Ferrell - ActionML
>>>
>>> * Tamas Jambor - Channel4
>>>
>>> * Justin Yip - independent
>>>
>>> * Xusen Yin - USC
>>>
>>> * Lee Moon Soo - NFLabs
>>>
>>> * Donald Szeto - Salesforce
>>>
>>> * Kenneth Chan - Salesforce
>>>
>>> * Tom Chan - Salesforce
>>>
>>> * Simon Chan - Salesforce
>>>
>>> * Marco Vivero - Salesforce
>>>
>>> * Matthew Tovbin - Salesforce
>>>
>>> * Yevgeny Khodorkovsky - Salesforce
>>>
>>> * Felipe Oliveira - Salesforce
>>>
>>> * Vitaly Gordon - Salesforce
>>>
>>> === Sponsors ===
>>>
>>> ==== Champion ====
>>>
>>> Andrew Purtell <apurtell at apache dot org>
>>>
>>> ==== Nominated Mentors ====
>>>
>>> * Andrew Purtell <apurtell at apache dot org>
>>>
>>> * James Taylor <jtaylor at apache dot org>
>>>
>>> * Lars Hofhansl <larsh at apache dot org>
>>>
>>> * Suneel Marthi <smarthi at apache dot org>
>>>
>>> * Xiangrui Meng <meng at apache dot org>
>>>
>>> * Luciano Resende <lresende at apache dot org>
>>>
>>> ==== Sponsoring Entity ====
>>>
>>> Apache Incubator PMC
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message