Dear Wiki user, You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification. The "ParquetProposal" page has been changed by ChrisAniszczyk: https://wiki.apache.org/incubator/ParquetProposal?action=diff&rev1=2&rev2=3 == Background == - Parquet is built from the ground up with complex nested data structures in mind, and uses the repetition/definition level approach to encoding such data structures, as popularized by [Google Dremel](https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe this approach is superior to simple flattening of nested name spaces. + Parquet is built from the ground up with complex nested data structures in mind, and uses the repetition/definition level approach to encoding such data structures, as popularized by Google Dremel (https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe this approach is superior to simple flattening of nested name spaces. Parquet is built to support very efficient compression and encoding schemes. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. We separate the concepts of encoding and compression, allowing parquet consumers to implement operators that work directly on encoded data without paying decompression and decoding penalty when possible. @@ -30, +30 @@ == Current Status == - Parquet has undergone [2 major releases](https://github.com/Parquet/parquet-format/releases) of the core format and [22 releases](https://github.com/Parquet/parquet-mr/releases) of the supporting set of Java libraries. Parquet is being used in production by [many organizations](https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md). The Parquet source is currently hosted at github.com, which will seed the Apache git repository. + Parquet has undergone 2 major releases: https://github.com/Parquet/parquet-format/releases of the core format and 22 releases: https://github.com/Parquet/parquet-mr/releases of the supporting set of Java libraries. + + The Parquet source is currently hosted at GitHub, which will seed the Apache git repository. === Meritocracy === @@ -38, +40 @@ === Community === - There is a large need for an advanced columnar storage format for Hadoop. Parquet is currently being used by [several organizations](https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md). By bringing Parquet into Apache, we believe that the community will grow even bigger. + There is a large need for an advanced columnar storage format for Hadoop. Parquet is being used in production by many organizations (see https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md) + + * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392 + * Criteo: https://twitter.com/julsimon/statuses/312114074911666177 + * Salesforce: https://twitter.com/TwitterOSS/statuses/392734610116726784 + * Stripe: https://twitter.com/avibryant/statuses/391339949250715648 + * Twitter: https://twitter.com/J_/statuses/315844725611581441 + + By bringing Parquet into Apache, we believe that the community will grow even bigger. === Core Developers === - Parquet was [initially developed](https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop) as a collaboration between Twitter, Cloudera and Criteo. + Parquet was initially developed as a collaboration between Twitter, Cloudera and Criteo. + + See https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop === Alignment === @@ -56, +68 @@ === Inexperience with Open Source === - Parquet has existed as a healthy open source for one year. During that time, we have curated an open-source community successfully, attracting over [30 contributors](https://github.com/Parquet/parquet-mr/graphs/contributors) from a diverse group of companies. + Parquet has existed as a healthy open source for one year. During that time, we have curated an open-source community successfully, attracting over 40 contributors (see https://github.com/Parquet/parquet-mr/graphs/contributors) from a diverse group of companies. Several of the core contributors to the project are deeply familiar with OSS and Apache specifically: Julien Le Dem is the current PMC Chair for Apache Pig, and Dmitriy Ryaboy, Aniket Mokashi, and Jonathan Coveney are also Apache Pig committers with contributions to several other Apache projects. Todd Lipcon and Tom White are committers to Apache Hadoop and multiple other related projects. Brock Noland is a Hive committer. === Homogenous Developers === @@ -65, +77 @@ === Reliance on Salaried Developers === - It is expected that Parquet development will occur on both salaried time and on volunteer time, after hours. The majority of initial committers are paid by their employers to contribute to this project. However, they are all passionate about the project, and we are confident that the project will continue even if no salaried developers contribute to the project. As evidence of this statement, we present the [GitHub "punchcard"](https://github.com/Parquet/parquet-mr/graphs/punch-card) showing that a lot of activity happens on weekends. We are committed to recruiting additional committers including non-salaried developers. + It is expected that Parquet development will occur on both salaried time and on volunteer time, after hours. The majority of initial committers are paid by their employers to contribute to this project. However, they are all passionate about the project, and we are confident that the project will continue even if no salaried developers contribute to the project. As evidence of this statement, we present the GitHub punchcard (see https://github.com/Parquet/parquet-mr/graphs/punch-card) showing that a lot of activity happens on weekends. We are committed to recruiting additional committers including non-salaried developers. === Relationships with Other Apache Products === --------------------------------------------------------------------- To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org For additional commands, e-mail: cvs-help@incubator.apache.org