incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Trivial Update of "ParquetProposal" by ChrisAniszczyk
Date Mon, 12 May 2014 16:49:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "ParquetProposal" page has been changed by ChrisAniszczyk:
https://wiki.apache.org/incubator/ParquetProposal?action=diff&rev1=2&rev2=3

  
  == Background ==
  
- Parquet is built from the ground up with complex nested data structures in mind, and uses
the repetition/definition level approach to encoding such data structures, as popularized
by [Google Dremel](https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe
this approach is superior to simple flattening of nested name spaces.
+ Parquet is built from the ground up with complex nested data structures in mind, and uses
the repetition/definition level approach to encoding such data structures, as popularized
by Google Dremel (https://blog.twitter.com/2013/dremel-made-simple-with-parquet). We believe
this approach is superior to simple flattening of nested name spaces.
  
  Parquet is built to support very efficient compression and encoding schemes. Parquet allows
compression schemes to be specified on a per-column level, and is future-proofed to allow
adding more encodings as they are invented and implemented. We separate the concepts of encoding
and compression, allowing parquet consumers to implement operators that work directly on encoded
data without paying decompression and decoding penalty when possible.
  
@@ -30, +30 @@

  
  == Current Status ==
  
- Parquet has undergone [2 major releases](https://github.com/Parquet/parquet-format/releases)
of the core format and [22 releases](https://github.com/Parquet/parquet-mr/releases) of the
supporting set of Java libraries. Parquet is being used in production by [many organizations](https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md).
The Parquet source is currently hosted at github.com, which will seed the Apache git repository.
+ Parquet has undergone 2 major releases: https://github.com/Parquet/parquet-format/releases
of the core format and 22 releases: https://github.com/Parquet/parquet-mr/releases of the
supporting set of Java libraries.
+ 
+ The Parquet source is currently hosted at GitHub, which will seed the Apache git repository.
  
  === Meritocracy ===
  
@@ -38, +40 @@

  
  === Community ===
  
- There is a large need for an advanced columnar storage format for Hadoop. Parquet is currently
being used by [several organizations](https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md).
By bringing Parquet into Apache, we believe that the community will grow even bigger.
+ There is a large need for an advanced columnar storage format for Hadoop. Parquet is being
used in production by many organizations (see https://github.com/Parquet/parquet-mr/blob/master/PoweredBy.md)
+ 
+  * Cloudera: https://twitter.com/HenryR/statuses/324222874011451392
+  * Criteo: https://twitter.com/julsimon/statuses/312114074911666177
+  * Salesforce: https://twitter.com/TwitterOSS/statuses/392734610116726784
+  * Stripe: https://twitter.com/avibryant/statuses/391339949250715648
+  * Twitter: https://twitter.com/J_/statuses/315844725611581441
+ 
+ By bringing Parquet into Apache, we believe that the community will grow even bigger.
  
  === Core Developers ===
  
- Parquet was [initially developed](https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop)
as a collaboration between Twitter, Cloudera and Criteo. 
+ Parquet was initially developed as a collaboration between Twitter, Cloudera and Criteo.
+ 
+ See https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop
  
  === Alignment ===
  
@@ -56, +68 @@

  
  === Inexperience with Open Source ===
  
- Parquet has existed as a healthy open source for one year. During that time, we have curated
an open-source community successfully, attracting over [30 contributors](https://github.com/Parquet/parquet-mr/graphs/contributors)
from a diverse group of companies.
+ Parquet has existed as a healthy open source for one year. During that time, we have curated
an open-source community successfully, attracting over 40 contributors (see https://github.com/Parquet/parquet-mr/graphs/contributors)
from a diverse group of companies.
  Several of the core contributors to the project are deeply familiar with OSS and Apache
specifically: Julien Le Dem is the current PMC Chair for Apache Pig, and Dmitriy Ryaboy, Aniket
Mokashi, and Jonathan Coveney are also Apache Pig committers with contributions to several
other Apache projects. Todd Lipcon and Tom White are committers to Apache Hadoop and multiple
other related projects. Brock Noland is a Hive committer.
  
  === Homogenous Developers ===
@@ -65, +77 @@

  
  === Reliance on Salaried Developers ===
  
- It is expected that Parquet development will occur on both salaried time and on volunteer
time, after hours. The majority of initial committers are paid by their employers to contribute
to this project. However, they are all passionate about the project, and we are confident
that the project will continue even if no salaried developers contribute to the project. As
evidence of this statement, we present the [GitHub "punchcard"](https://github.com/Parquet/parquet-mr/graphs/punch-card)
showing that a lot of activity happens on weekends. We are committed to recruiting additional
committers including non-salaried developers. 
+ It is expected that Parquet development will occur on both salaried time and on volunteer
time, after hours. The majority of initial committers are paid by their employers to contribute
to this project. However, they are all passionate about the project, and we are confident
that the project will continue even if no salaried developers contribute to the project. As
evidence of this statement, we present the GitHub punchcard (see https://github.com/Parquet/parquet-mr/graphs/punch-card)
showing that a lot of activity happens on weekends. We are committed to recruiting additional
committers including non-salaried developers. 
  
  === Relationships with Other Apache Products ===
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message