incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: [DISCUSS] Apache Pinot Incubator Proposal
Date Wed, 14 Feb 2018 01:01:19 GMT
Kevin,

Increasing the adoption of Pinot is one thing that can help build a good
diverse community. Few things that come to my mind
- Improve documentation
- Better integration with cloud providers
- Meetup and blog posts.

We would also love to get additional mentors from ASF to help us build the
community around Pinot.




On Tue, Feb 13, 2018 at 4:29 PM, Timothy Chen <tnachen@gmail.com> wrote:

> Love to see this in the incubator as well. +1
>
> Tim
>
> On Tue, Feb 13, 2018 at 4:22 PM, Kevin A. McGrail
> <kevin.mcgrail@mcgrail.com> wrote:
> > Agreed.  It could use more mentors from ASF which I'm too overloaded to
> help
> > with but I'd be inclined to +1 this.  Do you have some thoughts on
> getting
> > more community people outside of LI and Uber to help?
> >
> > On 2/13/2018 7:07 PM, Dave Fisher wrote:
> >>
> >> Noir or Blanc? Gris or Grigio? What’s the vintage?
> >>
> >> All kidding aside this looks interesting.
> >>
> >> Regards,
> >> Dave
> >>
> >> Sent from my iPhone
> >>
> >>> On Feb 13, 2018, at 12:10 AM, kishore g <g.kishore@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I would like to propose Pinot as an Apache Incubator project. The
> >>> proposal
> >>> is available as a draft at
> >>> https://wiki.apache.org/incubator/PinotProposal. I
> >>> have also included the text of the proposal below.
> >>>
> >>> Any feedback from the community is much appreciated.
> >>>
> >>> Regards,
> >>> Kishore G
> >>>
> >>> = Pinot Proposal =
> >>>
> >>> == Abstract ==
> >>>
> >>> Pinot is a distributed columnar storage engine that can ingest data in
> >>> real-time and serve analytical queries at low latency. There are two
> >>> modes
> >>> of data ingestion - batch and/or realtime. Batch mode allows users to
> >>> generate pinot segments externally using systems such as Hadoop. These
> >>> segments can be uploaded into Pinot via simple curl calls. Pinot can
> >>> ingest
> >>> data in near real-time from streaming sources such as Kafka. Data
> >>> ingested
> >>> into Pinot is stored in a columnar format. Pinot provides a SQL like
> >>> interface (PQL) that supports filters, aggregations, and group by
> >>> operations. It does not support joins by design, in order to guarantee
> >>> predictable latency. It leverages other Apache projects such as
> >>> Zookeeper,
> >>> Kafka, and Helix, along with many libraries from the ASF.
> >>>
> >>> == Proposal ==
> >>>
> >>> Pinot was open sourced by LinkedIn and hosted on GitHub. Majority of
> the
> >>> development happens at LinkedIn with other contributions from Uber and
> >>> Slack. We believe that being a part of Apache Software Foundation will
> >>> improve the diversity and help form a strong community around the
> >>> project.
> >>>
> >>> LinkedIn submits this proposal to donate the code base to Apache
> Software
> >>> Foundation. The code is already under Apache License 2.0.  Code and the
> >>> documentation are hosted on Github.
> >>> * Code: http://github.com/linkedin/pinot
> >>> * Documentation: https://github.com/linkedin/pinot/wiki
> >>>
> >>>
> >>> == Background ==
> >>>
> >>> LinkedIn, similar to other companies, has many applications that
> provide
> >>> rich real-time insights to members and customers (internal and
> external).
> >>> The workload characteristics for these applications vary a lot. Some
> >>> internal applications simply need ad-hoc query capabilities with
> >>> sub-second
> >>> to multiple seconds latency. But external site facing applications
> >>> require
> >>> strong SLA even very high workloads. Prior to Pinot, LinkedIn had
> >>> multiple
> >>> solutions depending on the workload generated by the application and
> this
> >>> was inefficient. Pinot was developed to be the one single platform that
> >>> addresses all classes of applications. Today at LinkedIn, Pinot powers
> >>> more
> >>> than 50 site facing products with workload ranging from few queries per
> >>> second to 1000’s of queries per second while maintaining the 99th
> >>> percentile latency which can be as low as few milliseconds. All
> internal
> >>> dashboards at LinkedIn are powered by Pinot.
> >>>
> >>> == Rationale ==
> >>>
> >>> We believe that requirement to develop rich real-time analytic
> >>> applications
> >>> is applicable to other organizations. Both Pinot and the interested
> >>> communities would benefit from this work being openly available.
> >>>
> >>> == Current Status ==
> >>>
> >>> Pinot is currently open sourced under the Apache License Version 2.0
> and
> >>> available at github.com/linkedin/pinot. All the development is done
> using
> >>> GitHub Pull Requests. We cut releases on a weekly basis and deploy it
> at
> >>> LinkedIn. mp-0.1.468 is the latest release tag that is deployed in
> >>> production.
> >>>
> >>> == Meritocracy ==
> >>>
> >>> Following the Apache meritocracy model, we intend to build an open and
> >>> diverse community around Pinot. We will encourage the community to
> >>> contribute to discussion and codebase.
> >>>
> >>> == Community ==
> >>>
> >>> Pinot is currently used extensively at LinkedIn and Uber. Several
> >>> companies
> >>> have expressed interest in the project. We hope to extend the
> contributor
> >>> base significantly by bringing Pinot into Apache.
> >>>
> >>> == Core Developers ==
> >>>
> >>> Pinot was started by engineers at LinkedIn, and now has committers from
> >>> Uber.
> >>>
> >>> == Alignment ==
> >>>
> >>> Apache is the most natural home for taking Pinot forward. Pinot
> leverages
> >>> several existing Apache Projects such as Kafka, Helix, Zookeeper, and
> >>> Avro.
> >>> As Pinot gains adoption, we plan to add support for the ORC and Parquet
> >>> formats, as well as adding integration with Yarn and Mesos.
> >>>
> >>> == Known Risks ==
> >>>
> >>> === Orphaned Products ===
> >>>
> >>> The risk of the Pinot project being abandoned is minimal. The teams at
> >>> LinkedIn and Uber are highly incentivized to continue development of
> >>> Pinot
> >>> as it is a critical part of their infrastructure.
> >>>
> >>> === Inexperience with Open Source ===
> >>>
> >>> Post open sourcing, Pinot was completely developed on GitHub. All the
> >>> current developers on Pinot are well aware of the open source
> development
> >>> process. However, most of the developers are new to the Apache process.
> >>> Kishore Gopalakrishna, one of the lead developers in Pinot, is VP and
> >>> committer of the Apache Helix project.
> >>>
> >>> === Homogenous Developers ===
> >>>
> >>> The current core developers are all from LinkedIn and Uber. However, we
> >>> hope to establish a developer community that includes contributors from
> >>> several corporations and we are actively encouraging new contributors
> via
> >>> the mailing lists and public presentations of Pinot.
> >>>
> >>> === Reliance on Salaried Developers ===
> >>>
> >>> It is expected that Pinot development will occur on both salaried time
> >>> and
> >>> on volunteer time, after hours. The majority of initial committers are
> >>> paid
> >>> by their employer to contribute to this project. However, they are all
> >>> passionate about the project, and we are confident that the project
> will
> >>> continue even if no salaried developers contribute to the project. We
> are
> >>> committed to recruiting additional committers including non-salaried
> >>> developers.
> >>>
> >>> === Relationships with Other Apache Products ===
> >>>
> >>> As mentioned earlier, Pinot uses several Apache Projects such as Kafka
> to
> >>> ingest data in real-time, Zookeeper and Helix for cluster management.
> >>> Pinot
> >>> also uses Maven for build and release. We foresee adding support for
> the
> >>> Parquet and ORC formats. Adding the ability to deploy on Yarn and Mesos
> >>> clusters is another interesting project we might pursue.
> >>>
> >>> === An Excessive Fascination with the Apache Brand ===
> >>>
> >>> While we respect the reputation of the Apache brand and have no doubts
> >>> that
> >>> it will attract contributors and users, we believe ASF is the right
> home
> >>> for Pinot to foster a great community that will lead to a better
> outcome
> >>> in
> >>> the long term.
> >>>
> >>> == Documentation ==
> >>>
> >>> * Code: https://github.com/linkedin/pinot/
> >>> * Documentation: https://github.com/linkedin/pinot/wiki
> >>> * User group: https://groups.google.com/forum/#!forum/pinot_users
> >>>
> >>> == Initial Source ==
> >>>
> >>> The current Pinot codebase is hosted on Github and licensed under the
> >>> Apache License V2. The source tree is self contained and relies on
> Maven
> >>> as
> >>> its build and dependency resolution mechanism.
> >>>
> >>> == External Dependencies ==
> >>>
> >>> All dependencies in Pinot have licenses that are compatible with Apache
> >>> License V2, except for the org.json library, which will be removed
> prior
> >>> to
> >>> Apache incubation. The list below summarizes the external dependencies
> of
> >>> Pinot grouped by license and ASF license category.
> >>>
> >>> Dependencies from the ASF Category A
> >>> === Apache License 2.0 ===
> >>> * com.101tec:zkclient:0.7
> >>> * com.alibaba:fastjson:1.1.24
> >>> * com.clearspring.analytics:stream:2.7.0
> >>> * com.fasterxml.jackson.core:jackson-annotations:2.8.0
> >>> * com.fasterxml.jackson.core:jackson-core:2.8.0
> >>> * com.fasterxml.jackson.core:jackson-databind:2.8.0
> >>> * com.google.code.findbugs:jsr305:3.0.0
> >>> * com.google.guava:guava:19
> >>> * com.ning:async-http-client:1.9.21
> >>> * com.yammer.metrics:metrics-core:2.2.0
> >>> * commons-beanutils:commons-beanutils:1.8.3
> >>> * commons-cli:commons-cli:1.2
> >>> * commons-codec:commons-codec:1.6
> >>> * commons-configuration:commons-configuration:1.6
> >>> * commons-fileupload:commons-fileupload:1.2.2
> >>> * commons-httpclient:commons-httpclient:3.1
> >>> * commons-io:commons-io:2.1
> >>> * commons-validator:commons-validator:1.4.0
> >>> * io.netty:netty-all:4.1.4.Final
> >>> * io.swagger:swagger-jaxrs:1.5.10
> >>> * io.swagger:swagger-jersey2-jaxrs:1.5.10
> >>> * it.unimi.dsi:fastutil:6.5.16
> >>> * joda-time:joda-time:2
> >>> * log4j:log4j:1.2.17
> >>> * me.lemire.integercompression:JavaFastPFOR:0.0.13
> >>> * nl.jqno.equalsverifier:equalsverifier:1.7.2
> >>> * org.apache.avro:avro:1.7.6
> >>> * org.apache.commons:commons-compress:1.9
> >>> * org.apache.commons:commons-lang3:3.5
> >>> * org.apache.commons:commons-math:2.1
> >>> * org.apache.hadoop:hadoop-client:2.7.0
> >>> * org.apache.hadoop:hadoop-common:2.7.0
> >>> * org.apache.helix:helix-core:0.6.8
> >>> * org.apache.httpcomponents:httpclient:4.1.3
> >>> * org.apache.httpcomponents:httpclient:4.2.5
> >>> * org.apache.httpcomponents:httpcore:4.2.5
> >>> * org.apache.httpcomponents:httpmime:4.2.5
> >>> * org.apache.kafka:kafka_2.10:0.9.0.1
> >>> * org.apache.thrift:libthrift:0.9.1
> >>> * org.apache.zookeeper:zookeeper:3.4.9
> >>> * org.codehaus.jackson:jackson-core-asl:1.9.6
> >>> * org.codehaus.jackson:jackson-mapper-asl:1.9.6
> >>> * org.json:json:20080701
> >>> * org.roaringbitmap:RoaringBitmap:0.5.10
> >>> * org.testng:testng:6.0.1
> >>> * org.twitter4j:twitter4j-core:4.0.3
> >>> * org.webjars:swagger-ui:2.2.2
> >>> * org.xerial.larray:larray:0.2.1
> >>> * org.yaml:snakeyaml:1.16
> >>> * xml-apis:xml-apis:1.0.b2
> >>> === Dual license (Apache License 2.0 + LGPL 2.1), using under the
> Apache
> >>> License ===
> >>> * org.codehaus.jackson:jackson-jaxrs:1.9.6
> >>> * org.codehaus.jackson:jackson-xc:1.9.6
> >>> === BSD ===
> >>> * com.jcabi:jcabi-log:0.17.1
> >>> * org.antlr:antlr4-annotations:4.3
> >>> * org.antlr:antlr4-runtime:4.3
> >>> === MIT ===
> >>> * com.github.nkzawa:socket.io-client:0.5.1
> >>> * org.mockito:mockito-core:2.10.0
> >>> * org.slf4j:slf4j-api:1.7.7
> >>> * org.slf4j:slf4j-log4j12:1.7.7
> >>>
> >>> === Dependencies from the ASF Category B ===
> >>> Dual license (CDDL 1.1 + GPL 2 w/ CPE), using under the CDDL
> >>> * com.sun.jersey:jersey-client:1.19.2
> >>> * javax.servlet:javax.servlet-api:3.0.1
> >>> * org.glassfish.jersey.containers:jersey-container-grizzly2-http:2.23
> >>> * org.glassfish.jersey.core:jersey-common:2.23
> >>> * org.glassfish.jersey.core:jersey-server:2.23
> >>> * org.glassfish.jersey.media:jersey-media-json-jackson:2.24
> >>> * org.glassfish.jersey.media:jersey-media-multipart:2.23
> >>>
> >>> === Dependencies from the ASF Category X ===
> >>> JSON License
> >>> * org.json:json:20080701 (to be removed before Apache incubation)
> >>>
> >>>
> >>> == Cryptography ==
> >>>
> >>> None
> >>>
> >>> == Required Resources ==
> >>>
> >>> === Mailing lists ===
> >>>
> >>> * pinot-private (with moderated subscriptions)
> >>> * pinot-user
> >>> * pinot-dev
> >>> * pinot-commits
> >>>
> >>> === Git repository ===
> >>>
> >>> * git://git.apache.org/pinot
> >>> * https://git-wip-us.apache.org/repos/asf/incubator-pinot.git
> >>>
> >>> === Issue Tracking ===
> >>>
> >>> A JIRA Issue tracker (PINOT)
> >>>
> >>> === Other Resources ===
> >>>
> >>> The existing code already has unit and integration tests and we use
> >>> travis
> >>> to test the patch before committing it to master. We would like to have
> >>> an
> >>> instance of Jenkins to achieve similar functionality.
> >>>
> >>> == Initial Committers ==
> >>>
> >>> * Kishore Gopalakrishna
> >>> * Ravi Aringunram
> >>> * Jean-François Im
> >>> * Mayank Shrivastava
> >>> * Subbu Subramaniam
> >>> * Adwait Tumbde
> >>> * Xiaotian Jiang
> >>> * Jennifer Dai
> >>> * Seunghyun Lee
> >>> * Xiang Fu
> >>> * Dhaval Patel
> >>> * Neha Pawar
> >>> * Alex Pucher
> >>> * Yen-Jung Chang
> >>>
> >>>
> >>>
> >>> == Affiliations  ==
> >>>
> >>> * Kishore Gopalakrishna (LinkedIn)
> >>> * Ravi Aringunram (LinkedIn)
> >>> * Jean-François Im (LinkedIn)
> >>> * Mayank Shrivastava (LinkedIn)
> >>> * Subbu Subramaniam (LinkedIn)
> >>> * Adwait Tumbde (LinkedIn)
> >>> * Xiaotian Jiang (LinkedIn)
> >>> * Jennifer Dai (LinkedIn)
> >>> * Seunghyun Lee (LinkedIn)
> >>> * Xiang Fu (Uber)
> >>> * Dhaval Patel (Uber)
> >>> * Neha Pawar (LinkedIn)
> >>> * Alex Pucher (LinkedIn)
> >>> * Yen-Jung Chang (LinkedIn)
> >>>
> >>> == Sponsors ==
> >>>
> >>> === Champion ===
> >>>
> >>> * Olivier Lamy < olamy at apache dot org>
> >>>
> >>> === Nominated Mentors ===
> >>>
> >>> * Olivier Lamy <olamy at apache dot org>
> >>>
> >>> === Sponsoring Entity ===
> >>>
> >>> The Apache Incubator
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message