Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DEECF2009A8 for ; Tue, 17 May 2016 22:58:03 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DD7D11609F5; Tue, 17 May 2016 20:58:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AF2D41607A8 for ; Tue, 17 May 2016 22:58:02 +0200 (CEST) Received: (qmail 50303 invoked by uid 500); 17 May 2016 20:58:01 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 50292 invoked by uid 99); 17 May 2016 20:58:01 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 May 2016 20:58:01 +0000 Received: from mail-yw0-f182.google.com (mail-yw0-f182.google.com [209.85.161.182]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 008D91A0040 for ; Tue, 17 May 2016 20:58:00 +0000 (UTC) Received: by mail-yw0-f182.google.com with SMTP id g133so28601056ywb.2 for ; Tue, 17 May 2016 13:58:00 -0700 (PDT) X-Gm-Message-State: AOPr4FXNj7XV7rhhtptJSSG1PQPo4c3jyqIP9tGJNwETrRP8vWneLqPhye8E4Om0JxNFst9Wh+dUKjfGrVyHhA== MIME-Version: 1.0 X-Received: by 10.129.82.214 with SMTP id g205mr1899715ywb.292.1463518680125; Tue, 17 May 2016 13:58:00 -0700 (PDT) Received: by 10.129.5.214 with HTTP; Tue, 17 May 2016 13:57:59 -0700 (PDT) In-Reply-To: References: <5739FFC9.30905@nanthrax.net> Date: Tue, 17 May 2016 16:57:59 -0400 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [DISCUSS] PredictionIO incubation proposal From: Suneel Marthi To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=001a114dc6940d250f05330fffb2 archived-at: Tue, 17 May 2016 20:58:04 -0000 --001a114dc6940d250f05330fffb2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for having me as a mentor for PIO. I would like to be added to the initial list of committers and am looking to actively participate in the development too. I am not sure if my being a mentor automatically grants me the 'commit' karma. Its already been suggested earlier in this thread by Roman and Jean-Baptiste that the project needs to be decoupled from Spark and integrated with Beam. It would be good to reduce the reliance on Spark-Submit from what I have seen of the project so far. But let's not talk architecture and design here when the project's not in incubator yet. :) On Tue, May 17, 2016 at 4:09 PM, Henry Saputra wrote: > Cool, this will make code grant process to be easier =3D) > > The initial committers and mentors look great. > I am sure more will come as contributions start pouring in to the project= . > > Looking forward for the VOTE thread soon. > > - Henry > > On Mon, May 16, 2016 at 12:07 PM, Simon Chan wrote= : > > > Yes, it includes everyone who previously contributed code from > PredictionIO > > before the acquisition and still want to be involved in the project. > > > > We may have missed "Alex Merritt", going to add him to the list soon. > > > > Simon > > > > > > On Mon, May 16, 2016 at 11:58 AM, Suneel Marthi > > wrote: > > > > > I do have a question about the proposed list of committers. > > > > > > Does the list also include all of those folks who were with > PredictionIO > > > (and had contributed to the project) and then chose to leave when PIO > was > > > acquired by Salesforce? > > > > > > > > > > > > > > > On Mon, May 16, 2016 at 1:13 PM, Jean-Baptiste Onofr=C3=A9 > > > > wrote: > > > > > > > By the way, we have some discussion about integrating Zeppelin with > > Beam > > > ;) > > > > > > > > Regards > > > > JB > > > > > > > > On 05/15/2016 02:32 AM, Roman Shaposhnik wrote: > > > > > > > >> Super excited to see this proposal! This will finally allow us to > have > > > >> an ASF managed > > > >> backend for next generation data-driven apps that I see emerging > quite > > > >> rapidly. > > > >> > > > >> The proposal looks great to me (although I'd recommend calling Sca= la > > > >> as an implementation > > > >> language more prominently since it may attract additional develope= rs > > > >> with affinity to it). > > > >> > > > >> I do have two questions about technology: > > > >> 1. do you think it would be possible to leverage Apache Beam > > > >> (incubating) > > > >> for abstracting away dependency on execution frameworks? M= y > > > >> understanding > > > >> is that PredictionIO currently only run on Spark. > > > >> 2. is there a potential integration with Apache Zeppelin > possible? > > > >> > > > >> Thanks, > > > >> Roman. > > > >> > > > >> On Fri, May 13, 2016 at 1:41 PM, Andrew Purtell < > apurtell@apache.org> > > > >> wrote: > > > >> > > > >>> Greetings, > > > >>> > > > >>> It is my pleasure to > > > >>> > > > >>> propose the PredictionIO project for incubation at the Apache > > Software > > > >>> Foundation. > > > >>> > > > >>> PredictionIO is a > > > >>> popular > > > >>> open > > > >>> > > > >>> source Machine Learning Server built on top of a state-of-the-art > > open > > > >>> source stack, including several Apache technologies, that > > > >>> > > > >>> enables developers to manage and deploy production-ready predicti= ve > > > >>> services for various kinds of machine learning tasks > > > >>> , with more than 400 production deployments around the world and = a > > > >>> growing > > > >>> contributor community. > > > >>> > > > >>> > > > >>> The text of the proposal is included below and is also available = at > > > >>> https://wiki.apache.org/incubator/PredictionIO > > > >>> > > > >>> Best regards, > > > >>> Andrew Purtell > > > >>> > > > >>> > > > >>> =3D PredictionIO Proposal =3D > > > >>> > > > >>> =3D=3D=3D Abstract =3D=3D=3D > > > >>> PredictionIO is an open source Machine Learning Server built on t= op > > of > > > >>> state-of-the-art open source stack, that enables developers to > manage > > > and > > > >>> deploy production-ready predictive services for various kinds of > > > machine > > > >>> learning tasks. > > > >>> > > > >>> =3D=3D=3D Proposal =3D=3D=3D > > > >>> The PredictionIO platform consists of the following components: > > > >>> > > > >>> * PredictionIO framework - provides the machine learning stack > for > > > >>> building, evaluating and deploying engines with machine learnin= g > > > >>> algorithms. It uses Apache Spark for processing. > > > >>> > > > >>> * Event Server - the machine learning analytics layer for > unifying > > > >>> events > > > >>> from multiple platforms. It can use Apache HBase or any JDBC > > backends > > > >>> as its data store. > > > >>> > > > >>> The PredictionIO community also maintains a > > > >>> > > > >>> Template Gallery, a place to > > > >>> publish and download (free or proprietary) engine templates for > > > different > > > >>> types of machine learning applications, and is a complemental par= t > of > > > the > > > >>> project. At this point we exclude the Template Gallery from the > > > proposal, > > > >>> as it has a separate set of contributors and we=E2=80=99re not fa= miliar > with > > an > > > >>> Apache approved mechanism to maintain such a gallery. > > > >>> > > > >>> You can find the Template Gallery at > > https://templates.prediction.io/ > > > >>> > > > >>> =3D=3D=3D Background =3D=3D=3D > > > >>> PredictionIO was started with a mission to democratize and bring > > > machine > > > >>> learning to the masses. > > > >>> > > > >>> Machine learning has traditionally been a luxury for big companie= s > > like > > > >>> Google, Facebook, and Netflix. There are ML libraries and tools > lying > > > >>> around the internet but the effort of putting them all together a= s > a > > > >>> production-ready infrastructure is a very resource-intensive task > > that > > > is > > > >>> remotely reachable by individuals or small businesses. > > > >>> > > > >>> PredictionIO is a production-ready, full stack machine learning > > system > > > >>> that > > > >>> allows organizations of any scale to quickly deploy machine > learning > > > >>> capabilities. It comes with official and community-contributed > > machine > > > >>> learning engine templates that are easy to customize. > > > >>> > > > >>> =3D=3D=3D Rationale =3D=3D=3D > > > >>> As usage and number of contributors to PredictionIO has grown > bigger > > > and > > > >>> more diverse, we have sought for an independent framework for the > > > project > > > >>> to keep thriving. We believe the Apache foundation is a great fit= . > > > >>> Joining > > > >>> Apache would ensure that tried and true processes and procedures > are > > in > > > >>> place for the growing number of organizations interested in > > > contributing > > > >>> to PredictionIO. PredictionIO is also a good fit for the Apache > > > >>> foundation. > > > >>> PredictionIO was built on top of several Apache projects (HBase, > > Spark, > > > >>> Hadoop). We are familiar with the Apache process and believe that > the > > > >>> democratic and meritocratic nature of the foundation aligns with > the > > > >>> project goals. > > > >>> > > > >>> =3D=3D=3D Initial Goals =3D=3D=3D > > > >>> The initial milestones will be to move the existing codebase to > > Apache > > > >>> and > > > >>> integrate with the Apache development process. Once this is > > > accomplished, > > > >>> we plan for incremental development and releases that follow the > > Apache > > > >>> guidelines, as well as growing our developer and user communities= . > > > >>> > > > >>> =3D=3D=3D Current Status =3D=3D=3D > > > >>> PredictionIO has undergone nine minor releases and many patches. > > > >>> PredictionIO is being used in production by Salesforce.com as wel= l > as > > > >>> many > > > >>> other organizations and apps. The PredictionIO codebase is > currently > > > >>> hosted at GitHub, which will form the basis of the Apache git > > > repository. > > > >>> > > > >>> =3D=3D=3D=3D Meritocracy =3D=3D=3D=3D > > > >>> We plan to invest in supporting a meritocracy. We will discuss th= e > > > >>> requirements in an open forum. We intend to invite additional > > > developers > > > >>> to participate. We will encourage and monitor community > participation > > > so > > > >>> that privileges can be extended to those that contribute. > > > >>> > > > >>> =3D=3D=3D=3D Community =3D=3D=3D=3D > > > >>> Acceptance into the Apache foundation would bolster the already > > strong > > > >>> user and developer community around PredictionIO. That community > > > includes > > > >>> many contributors from various other companies, and an active > mailing > > > >>> list > > > >>> composed of hundreds of users. > > > >>> > > > >>> =3D=3D=3D=3D Core Developers =3D=3D=3D=3D > > > >>> The core developers of our project are listed in our contributors > and > > > >>> initial PPMC below. Though many are employed at Salesforce.com, > there > > > are > > > >>> also engineers from ActionML, and independent developers. > > > >>> > > > >>> =3D=3D=3D Alignment =3D=3D=3D > > > >>> The ASF is the natural choice to host the PredictionIO project as > its > > > >>> goal > > > >>> is democratizing Machine Learning by making it more easily > accessible > > > to > > > >>> every user/developer. PredictionIO is built on top of several top > > level > > > >>> Apache projects as outlined above. > > > >>> > > > >>> =3D=3D=3D Known Risks =3D=3D=3D > > > >>> > > > >>> =3D=3D=3D=3D Orphaned products =3D=3D=3D=3D > > > >>> PredictionIO has a solid and growing community. It is deployed on > > > >>> production environments by companies of all sizes to run various > > kinds > > > of > > > >>> predictive engines. > > > >>> > > > >>> In addition to the community contribution to PredictionIO > framework, > > > the > > > >>> community is also actively contributing new engines to the Templa= te > > > >>> Gallery as well as SDKs and documentation for the project. > Salesforce > > > is > > > >>> committed to utilize and advance the PredictionIO code base and > > support > > > >>> its user community. > > > >>> > > > >>> =3D=3D=3D=3D Inexperience with Open Source =3D=3D=3D=3D > > > >>> PredictionIO has existed as a healthy open source project for > almost > > > two > > > >>> years and is the most starred Scala project on GitHub. All of the > > > >>> proposed > > > >>> committers have contributed to ASF and Linux Foundation open sour= ce > > > >>> projects. Several current committers on Apache projects and Apach= e > > > >>> Members > > > >>> are involved in this proposal and intend to provide mentorship. > > > >>> > > > >>> =3D=3D=3D=3D Homogeneous Developers =3D=3D=3D=3D > > > >>> The initial list of committers includes developers from several > > > >>> institutions, including Salesforce, ActionML, Channel4, USC as we= ll > > as > > > >>> unaffiliated developers. > > > >>> > > > >>> =3D=3D=3D=3D Reliance on Salaried Developers =3D=3D=3D=3D > > > >>> Like most open source projects, PredictionIO receives substantial > > > support > > > >>> from salaried developers. PredictionIO development is partially > > > supported > > > >>> by Salesforce.com, but there are many contributors from various > other > > > >>> companies, and an active mailing list composed of hundreds of > users. > > We > > > >>> will continue our efforts to ensure stewardship of the project to > be > > > >>> independent of salaried developers by meritocratically promoting > > those > > > >>> contributors to committers. > > > >>> > > > >>> =3D=3D=3D=3D Relationships with Other Apache Product =3D=3D=3D=3D > > > >>> PredictionIO relies heavily on top level apache projects such as > > Apache > > > >>> Spark, HBase and Hadoop. However it brings a distinguished > > > functionality, > > > >>> rather than just an abstraction - Machine Learning in a > plug-and-play > > > >>> fashion. > > > >>> > > > >>> Compared to Apache Mahout, which focuses on the development of a > wide > > > >>> variety of algorithms, PredictionIO offers a platform to manage t= he > > > whole > > > >>> machine learning workflow, including data collection, data > > preparation, > > > >>> modeling, deployment and management of predictive services in > > > production > > > >>> environments. > > > >>> > > > >>> =3D=3D=3D=3D An Excessive Fascination with the Apache Brand =3D= =3D=3D=3D > > > >>> PredictionIO is already a widely known open source project. This > > > proposal > > > >>> is not for the purpose of generating publicity. Rather, the prima= ry > > > >>> benefits to joining Apache are those outlined in the Rationale > > section. > > > >>> > > > >>> =3D=3D=3D Documentation =3D=3D=3D > > > >>> PredictionIO boasts rich and live documentation, included in the > code > > > >>> repo > > > >>> (docs/manual directory), is built with Middleman, and publicly > hosted > > > at > > > >>> https://docs.prediction.io > > > >>> > > > >>> =3D=3D=3D Initial Source and Intellectual Property Submission Pla= n =3D=3D=3D > > > >>> Currently, the PredictionIO codebase is distributed under the > Apache > > > 2.0 > > > >>> License and hosted on GitHub: > > > >>> https://github.com/PredictionIO/PredictionIO > > > >>> > > > >>> =3D=3D=3D External Dependencies =3D=3D=3D > > > >>> PredictionIO has the following external dependencies: > > > >>> * Apache Hadoop 2.4.0 (optional, required only if YARN and HDFS > are > > > >>> needed) > > > >>> * Apache Spark 1.3.0 for Hadoop 2.4 > > > >>> * Java SE Development Kit 8 > > > >>> * and one of the following sets: > > > >>> > > > >>> * PostgreSQL 9.1 > > > >>> > > > >>> > > > >>> or > > > >>> > > > >>> > > > >>> * MySQL 5.1 > > > >>> > > > >>> or > > > >>> > > > >>> > > > >>> * Apache HBase 0.98.6 > > > >>> > > > >>> > > > >>> * Elasticsearch 1.4.0 > > > >>> > > > >>> Upon acceptance to the incubator, we would begin a thorough > analysis > > of > > > >>> all transitive dependencies to verify this information and > introduce > > > >>> license checking into the build and release process by integratin= g > > with > > > >>> Apache RAT. > > > >>> > > > >>> =3D=3D=3D Cryptography =3D=3D=3D > > > >>> PredictionIO does not include cryptographic code. We utilize > standard > > > >>> JCE and JSSE APIs provided by the Java Runtime Environment. > > > >>> > > > >>> =3D=3D=3D Required Resources =3D=3D=3D > > > >>> We request that following resources be created for the project to > use > > > >>> > > > >>> =3D=3D=3D=3D Mailing lists =3D=3D=3D=3D > > > >>> > > > >>> predictionio-private@incubator.apache.org (with moderated > > > subscriptions) > > > >>> > > > >>> predictionio-dev > > > >>> > > > >>> predictionio-user > > > >>> > > > >>> predictionio-commits > > > >>> > > > >>> We will migrate the existing PredictionIO mailing lists. > > > >>> > > > >>> =3D=3D=3D=3D Git repository =3D=3D=3D=3D > > > >>> The PredictionIO team would like to use Git for source control, d= ue > > to > > > >>> our > > > >>> current use of GitHub. > > > >>> > > > >>> git://git.apache.org/incubator-predictionio > > > >>> > > > >>> =3D=3D=3D=3D Documentation =3D=3D=3D=3D > > > >>> https://predictionio.incubator.apache.org/docs/ > > > >>> > > > >>> =3D=3D=3D=3D JIRA instance =3D=3D=3D=3D > > > >>> PredictionIO currently uses the GitHub issue tracking system > > associated > > > >>> with its repository: > > > https://github.com/PredictionIO/PredictionIO/issues > > > >>> . > > > >>> We will migrate to Apache JIRA. > > > >>> > > > >>> JIRA PREDICTIONIO > > > >>> https://issues.apache.org/jira/browse/PREDICTIONIO > > > >>> > > > >>> =3D=3D=3D=3D Other Resources =3D=3D=3D=3D > > > >>> * TravisCI for builds and test running. > > > >>> > > > >>> * PredictionIO's documentation, included in the code repo > > (docs/manual > > > >>> directory), is built with Middleman and publicly hosted > > > >>> https://docs.prediction.io > > > >>> > > > >>> * A blog to drive adoption and excitement at > > > https://blog.prediction.io > > > >>> > > > >>> =3D=3D=3D Initial Committers =3D=3D=3D > > > >>> > > > >>> * Pat Ferrell > > > >>> > > > >>> * Tamas Jambor > > > >>> > > > >>> * Justin Yip > > > >>> > > > >>> * Xusen Yin > > > >>> > > > >>> * Lee Moon Soo > > > >>> > > > >>> * Donald Szeto > > > >>> > > > >>> * Kenneth Chan > > > >>> > > > >>> * Tom Chan > > > >>> > > > >>> * Simon Chan > > > >>> > > > >>> * Marco Vivero > > > >>> > > > >>> * Matthew Tovbin > > > >>> > > > >>> * Yevgeny Khodorkovsky > > > >>> > > > >>> * Felipe Oliveira > > > >>> > > > >>> * Vitaly Gordon > > > >>> > > > >>> =3D=3D=3D Affiliations =3D=3D=3D > > > >>> > > > >>> * Pat Ferrell - ActionML > > > >>> > > > >>> * Tamas Jambor - Channel4 > > > >>> > > > >>> * Justin Yip - independent > > > >>> > > > >>> * Xusen Yin - USC > > > >>> > > > >>> * Lee Moon Soo - NFLabs > > > >>> > > > >>> * Donald Szeto - Salesforce > > > >>> > > > >>> * Kenneth Chan - Salesforce > > > >>> > > > >>> * Tom Chan - Salesforce > > > >>> > > > >>> * Simon Chan - Salesforce > > > >>> > > > >>> * Marco Vivero - Salesforce > > > >>> > > > >>> * Matthew Tovbin - Salesforce > > > >>> > > > >>> * Yevgeny Khodorkovsky - Salesforce > > > >>> > > > >>> * Felipe Oliveira - Salesforce > > > >>> > > > >>> * Vitaly Gordon - Salesforce > > > >>> > > > >>> =3D=3D=3D Sponsors =3D=3D=3D > > > >>> > > > >>> =3D=3D=3D=3D Champion =3D=3D=3D=3D > > > >>> > > > >>> Andrew Purtell > > > >>> > > > >>> =3D=3D=3D=3D Nominated Mentors =3D=3D=3D=3D > > > >>> > > > >>> * Andrew Purtell > > > >>> > > > >>> * James Taylor > > > >>> > > > >>> * Lars Hofhansl > > > >>> > > > >>> * Suneel Marthi > > > >>> > > > >>> * Xiangrui Meng > > > >>> > > > >>> * Luciano Resende > > > >>> > > > >>> =3D=3D=3D=3D Sponsoring Entity =3D=3D=3D=3D > > > >>> > > > >>> Apache Incubator PMC > > > >>> > > > >> > > > >> > --------------------------------------------------------------------- > > > >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > > > >> For additional commands, e-mail: general-help@incubator.apache.org > > > >> > > > >> > > > > -- > > > > Jean-Baptiste Onofr=C3=A9 > > > > jbonofre@apache.org > > > > http://blog.nanthrax.net > > > > Talend - http://www.talend.com > > > > > > > > -------------------------------------------------------------------= -- > > > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > > > > For additional commands, e-mail: general-help@incubator.apache.org > > > > > > > > > > > > > > --001a114dc6940d250f05330fffb2--