incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <cdoug...@apache.org>
Subject Re: [VOTE] Accept Apache AsterixDB in to the Incubator
Date Sun, 22 Feb 2015 04:18:00 GMT
+1 (binding) -C

On Thu, Feb 19, 2015 at 9:38 PM, Mattmann, Chris A (3980)
<chris.a.mattmann@jpl.nasa.gov> wrote:
> Hi Everyone,
>
> OK, discussion has died down on this thread. I was originally
> suggesting that the pTLP option may be best for this community,
> but after some discussions with the existing community of
> AsterixDB’ers proposing to bring the project here to the ASF,
> AsterixDB would like to move forward independent of whatever
> comes of the pTLP discussions.
>
> That said, I would like to propose Apache AsterixDB as an
> Incubator project. I am now calling a VOTE to accept AsterixDB
> into the Apache Incubator. This VOTE will run for at least 72 hours.
>
> [ ] +1 Accept Apache AsterixDB into the Incubator
> [ ] +0 Don’t care.
> [ ] -1 Don’t accept Apache AsterixDB into the Incubator because..
>
> Thanks for the feedback so far and looking forward to the VOTE!
>
> You can count my binding +1.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: <Mattmann>, Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
> Date: Wednesday, January 14, 2015 at 6:20 PM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Cc: Michael Carey <dtabass@gmail.com>, Ian Maxon <imaxon@uci.edu>, Till
> Westmann <till@westmann.org>
> Subject: [PROPOSAL] Apache AsterixDB Incubator
>
>>Hi Folks,
>>
>>I am pleased to bring forth the Apache AsterixDB proposal to the
>>Apache Incubator as Champion, working in collaboration with the
>>team. Please find the wiki proposal here:
>>
>>https://wiki.apache.org/incubator/AsterixDBProposal
>>
>>
>>Full text of the proposal is below. Please discuss and enjoy. I’ll
>>leave the discussion open for a week, and then look to call a VOTE
>>hopefully end of next week if all is well.
>>
>>Cheers!
>>Chris Mattmann
>>
>>=============================================================
>>Apache AsterixDB Proposal
>>
>>Abstract
>>
>>Apache AsterixDB is a scalable big data management system (BDMS) that
>>provides storage, management, and query capabilities for large
>>collections of semi-structured data.
>>
>>Proposal
>>
>>AsterixDB is a big data management system (BDMS) that makes it
>>well-suited to needs such as web data warehousing and social data
>>storage and analysis. Feature-wise, AsterixDB has:
>>
>>* A NoSQL style data model (ADM) based on extending JSON with object
>>  database concepts.
>>* An expressive and declarative query language (AQL) for querying
>>  semi-structured data.
>>* A runtime query execution engine, Hyracks, for partitioned-parallel
>>  execution of query plans.
>>* Partitioned LSM-based data storage and indexing for efficient
>>  ingestion of newly arriving data.
>>* Support for querying and indexing external data (e.g., in HDFS) as
>>  well as data stored within AsterixDB.
>>* A rich set of primitive data types, including support for spatial,
>>  temporal, and textual data.
>>* Indexing options that include B+ trees, R trees, and inverted
>>  keyword index support.
>>* Basic transactional (concurrency and recovery) capabilities akin to
>>  those of a NoSQL store.
>>
>>
>>Background and Rationale
>>
>>In the world of relational databases, the need to tackle data volumes
>>that exceed the capabilities of a single server led to the
>>development of “shared-nothing” parallel database systems several
>>decades ago. These systems spread data over a cluster based on a
>>partitioning strategy, such as hash partitioning, and queries are
>>processed by employing partitioned-parallel divide-and-conquer
>>techniques. Since these systems are fronted by a high-level,
>>declarative language (SQL), their users are shielded from the
>>complexities of parallel programming. Parallel database systems have
>>been an extremely successful application of parallel computing, and
>>quite a number of commercial products exist today.
>>
>>In the distributed systems world, the Web brought a need to index and
>>query its huge content. SQL and relational databases were not the
>>answer, though shared-nothing clusters again emerged as the hardware
>>platform of choice. Google developed the Google File System (GFS) and
>>MapReduce programming model to allow programmers to store and process
>>Big Data by writing a few user-defined functions. The MapReduce
>>framework applies these functions in parallel to data instances in
>>distributed files (map) and to sorted groups of instances sharing a
>>common key (reduce) -- not unlike the partitioned parallelism in
>>parallel database systems. Apache's Hadoop MapReduce platform is the
>>most prominent implementation of this paradigm for the rest of the
>>Big Data community. On top of Hadoop and HDFS sit declarative
>>languages like Pig and Hive that each compile down to Hadoop
>>MapReduce jobs.
>>
>>The big Web companies were also challenged by extreme user bases
>>(100s of millions of users) and needed fast simple lookups and
>>updates to very large keyed data sets like user profiles. SQL
>>databases were deemed either too expensive or not scalable, so the
>>“NoSQL movement” was born. The ASF now has HBase and Cassandra, two
>>popular key-value stores, in this space. MongoDB and Couchbase are
>>other open source alternatives (document stores).
>>
>>It is evident from the rapidly growing popularity of "NoSQL" stores,
>>as well as the strong demand for Big Data analytics engines today,
>>that there is a strong (and growing!) need to store, process, *and*
>>query large volumes of semi-structured data in many application
>>areas. Until very recently, developers have had to ``choose'' between
>>using big data analytics engines like Apache Hive or Apache Spark,
>>which can do complex query processing and analysis over HDFS-resident
>>files, and flexible but low-function data stores like MongoDB or
>>Apache HBase. (The Apache Phoenix project,
>>http://phoenix.apache.org/, is a recent SQL-over-HBase effort that
>>aims to bridge between these choices.)
>>
>>AsterixDB is a highly scalable data management system that can store,
>>index, and manage semi-structured data, e.g., much like MongoDB, but
>>it also supports a full-power query language with the expressiveness
>>of SQL (and more). Unlike analytics engines like Hive or Spark, it
>>stores and manages data, so AsterixDB can exploit its knowledge of
>>data partitioning and the availability of indexes to avoid always
>>scanning data set(s) to process queries. Somewhat surprisingly, there
>>is no open source parallel database system (relational or otherwise)
>>available to developers today -- AsterixDB aims to fill this need.
>>Since Apache is where the majority of the today's most important Big
>>Data technologies live, the ASF seems like the obvious home for a
>>system like AsterixDB.
>>
>>Current Status
>>
>>The current version of AsterixDB was co-developed by a team of
>>faculty, staff, and students at UC Irvine and UC Riverside. The
>>project was initiated as a large NSF-sponsored project in 2009, the
>>goal of which was to combine the best ideas from the parallel
>>database world, the then new Hadoop world, and the semi-structured
>>(e.g., XML/JSON) data world in order to create a next-generation
>>BDMS. A first informal open source release was made four years later,
>>in June of 2013, under the Apache Software License 2.0.
>>
>>
>>Meritocracy
>>
>>The current developers are familiar with meritocratic open source
>>development at Apache. Apache was chosen specifically because we want
>>to encourage this style of development for the project.
>>
>>
>>Community
>>
>>While AsterixDB started as a university project it has developed into
>>a community. A number of the initial committers started contributing
>>in academia and continue to actively participate and contribute after
>>graduation. And we seek to further develop developer and user
>>communities. One way to broaden the community that is ongoing is
>>through academic collaborations (currently with IIT Mumbai in India
>>and TU Berlin in Germany). During incubation we will also explicitly
>>seek increased industrial participation.
>>
>>Some indicators of the effort's development community and history can
>>be
>>found at:
>>https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_mo
>>,
>>https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo
>>
>>
>>Core Developers
>>
>>The core developers of the project are diverse, although initially UC
>>Irvine heavy (roughly 50) due to the project's origins at UCI. The
>>other 50 are from other academic institutions (UC Riverside and the
>>Hebrew University in Jerusalem) and companies (Couchbase, Facebook,
>>IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software).
>>
>>
>>Alignment
>>
>>Apache is, by far, the most natural home for taking the AsterixDB
>>project forward. A large fraction of today's top Big Data
>>technologies have their homes in Apache, including Hadoop, YARN, Pig,
>>Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a
>>significant gap -- the parallel data management system gap -- that
>>exists in the Big Data open source world. It is well-aligned with a
>>number of the Apache projects, e.g., it has strong support for
>>accessing and indexing external data in HDFS, and it uses YARN as an
>>answer to basic cluster resource management. AsterixDB also seeks to
>>achieve an Apache-style development model; it is seeking a broader
>>community of contributors and users in order to achieve its full
>>potential and value to the Big Data community.
>>
>>There are also a number of related Apache projects and dependencies
>>that will be mentioned below in the Relationships with Other Apache
>>products section.
>>
>>
>>Known Risks
>>
>>Orphaned products
>>
>>Given the current level of intellectual investment in AsterixDB, the
>>risk of the project being abandoned is very small. The UCI/UCR
>>faculty team leads are highly incentivized to continue development
>>since the database groups at UC Irvine and UC Riverside are both
>>reliant on AsterixDB as a platform for long-term graduate research
>>projects. UC San Diego is also beginning to contribute to the code
>>base, and a collaboration involving public health applications is
>>forming with UCLA. The work on AsterixDB is managed via a mix of
>>mailing list discussions supplemented by weekly project status
>>meetings which are summarized on the mailing list. Typical (local
>>plus Skype-in) attendance to the weekly status meetings runs at about
>>20 active contributors.
>>
>>
>>Inexperience with Open Source
>>
>>AsterixDB and Hyracks were completely developed in Open Source under
>>the ASL 2.0. The source code repositories, issue tracker, and mailing
>>lists are available on Google Code and discussions and decisions
>>happen on the mailing lists (which is necessary due to the geographic
>>distribution of the current developers).
>>
>>Also a few of the initial committers have contributed to Apache
>>projects. Vinayak Borkar is a committer on the Apache Helix and
>>Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF
>>and an IPMC member. Preston Carman and Steven Jacobs are committers
>>on the Apache VXQuery project.
>>
>>
>>Relationships with Other Apache Products
>>
>>Apache VXQuery is based on the Hyracks data-parallel runtime, which
>>is also included in the AsterixDB code base.
>>
>>AsterixDB is closely related to Apache Hadoop. Included in AsterixDB
>>is support for accessing external data in HDFS (and Hive formats),
>>and resource management and system administration features are in the
>>process of being migrated to YARN.
>>
>>AsterixDB's AQL query facilities offer comparable query power to
>>Apache's Pig and Hive systems for big data analytics. AsterixDB
>>differs in storing and indexing data and thus being able to quickly
>>answer small and medium queries without large HDFS data scans -
>>thereby targeting a different class of use cases.
>>
>>AsterixDB's data storage and indexing facilities are similar to those
>>of HBase, but AsterixDB differs in being a much more complete and
>>queryable BDMS (not just a key-value style store).
>>
>>AsterixDB's target use cases are not in-memory processing or
>>iterative algorithm support, making AsterixDB complementary to the
>>Apache Spark platform. (Spark interoperability is on our longer-term
>>to-do wishlist.)
>>
>>
>>Homogeneous Developers
>>
>>As mentioned before the current community is already organizationally
>>and geographically distributed - and we would like to increase the
>>heterogeneity.
>>
>>
>>Reliance on Salaried Developers
>>
>>Of the initial committers only 3 are full-time UCI staff. The other
>>committers are a mix of students, alumni who continue to contribute
>>to the effort, and individuals working with permission part-time (or
>>in spare time) on this project.
>>
>>
>>A Excessive Fascination with the Apache Brand
>>
>>We believe in the processes, systems, and framework Apache has put in
>>place. Apache is also known to foster a great community around their
>>projects and provide exposure. While brand is important, our
>>fascination with it is not excessive. We believe that the ASF is the
>>right home for AsterixDB and that having AsterixDB inside of the ASF
>>will lead to a better long-term outcome for the Big Data community.
>>
>>
>>Documentation
>>
>>Documentation and publications related to AsterixDB can be found at
>>http://asterixdb.ics.uci.edu/.
>>
>>
>>Initial Source
>>
>>Current source resides in Google code:
>>https://code.google.com/p/asterixdb/ (query language and upper system
>>layers) and https://code.google.com/p/hyracks/ (dataflow runtime
>>system and storage management libraries).
>>
>>
>>External Dependencies
>>
>>AsterixDB depends on a number of Apache projects:
>>
>>- Ant
>>- Avro
>>- ApacheDB JDO
>>- Commons
>>- Derby
>>- Hadoop
>>- Hive
>>- HTTPComponents
>>- Jakarta ORO
>>- Maven
>>- Tomcat
>>- Thrift
>>- Velocity
>>- Wicket
>>- Xerces
>>
>>and other open source projects (organized by license):
>>
>>-- ASL 2.0:
>> - Jackson
>> - Google Guava
>> - Google Guice
>> - JSON-simple
>> - BoneCP
>> - Microsoft Azure SDK
>> - Netty
>> - Rome
>> - JetS3t
>> - Groovy
>> - Jettison
>> - Plexus
>> - Datanucleus (JDO)
>> - Jetty
>> - Twitter4J
>> - Snappy-java
>>
>>-- BSD:
>> - Antlr
>> - ObjectWeb ASM
>> - Protobuf
>> - JSCH
>> - JavaCC
>> - Paranamer
>> - JLine
>> - Stax
>> - StringTemplate
>> - xmlEnc
>>
>>-- MIT
>> - AppAssembler
>> - SimpleLog4J
>>
>>-- CDDL 1.0
>> - Java Activation Framework
>> - Java Transactions
>> - Java Servlet API
>> - Grizzly
>> - gmbal
>> - Glassfish
>>
>>-- CDDL 1.1
>> - Jersey
>> - JAXB Reference Implementation
>>
>>-- JSON License
>> - JSON
>>
>>-- EPL 1.0
>> - JUnit
>>
>>-- JDOM License
>> - JDOM
>>
>>-- Public Domain
>> - xz
>> - AOPAlliance
>>
>>As all dependencies are managed using Apache Maven, none of the
>>external libraries need to be packaged in a source distribution.
>>
>>
>>Required Resources
>>
>>Developer and user mailing lists
>>
>>private@asterixdb.incubator.apache.org (with moderated subscriptions)
>>commits@asterixdb.incubator.apache.org
>>dev@asterixdb.incubator.apache.org
>>users@asterixdb.incubator.apache.org
>>
>>
>>A git repository
>>
>>https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git
>>
>>
>>A JIRA issue tracker
>>
>>https://issues.apache.org/jira/browse/ASTERIXDB
>>
>>
>>Initial Committers
>>
>>The following is a list of the planned initial Apache committers (the
>>active subset of the committers for the current repository at Google
>>code).
>>
>>Abdullah Alamoudi (bamousaa@gmail.com)
>>Cameron Samak (eufery@gmail.com)
>>Chen Li (chenli@gmail.com)
>>Ian Maxon (imaxon@uci.edu)
>>Ildar Absalyamov (ildar.absalyamov@gmail.com)
>>Jianfeng Jia (jianfeng.jia@gmail.com)
>>Karen Ouaknine (kereno@gmail.com)
>>Markus Dreseler (apache@dreseler.de)
>>Mike Carey (dtabass@apache.org)
>>Murtadha Hubail (hubailmor@gmail.com)
>>Pouria Pirzadeh (pouria.pirzadeh@gmail.com)
>>Preston Carman (prestonc@apache.org)
>>Raman Grover (RamanGrover29@gmail.com)
>>Sattam Alsubaiee (salsubaiee@gmail.com)
>>Steven Jacobs (sjaco002@apache.org)
>>Taewoo Kim (wangsaeu@gmail.com)
>>Till Westmann (tillw@apache.org)
>>Vinayak Borkar (vinayakb@apache.org)
>>Yingyi Bu (buyingyi@gmail.com)
>>Young-Seok Kim (kisskys@gmail.com)
>>Zach Heilbron (zheilbron@gmail.com)
>>
>>
>>Affiliations
>>
>>UC Irvine
>>- Mike Carey
>>- Chen Li
>>- Ian Maxon
>>- Yingyi Bu
>>- Raman Grover
>>- Pouria Pirzadeh
>>- Young-Seok Kim
>>- Cameron Samak
>>- Taewoo Kim
>>- Jianfeng Jia
>>- Murtadha Hubail
>>- Markus Dreseler
>>
>>UC Riverside
>>- Ildar Absalyamov
>>- Preston Carman
>>- Steven Jacobs
>>
>>Hebrew University
>>- Keren Ouaknine
>>
>>Oracle
>>- Till Westmann
>>
>>X15 Software
>>- Vinayak Borkar
>>- Zach Heilbron
>>
>>KACST Saudi Arabia
>>- Sattam Alsubaiee
>>
>>Saudi Aramco
>>- Abdullah Alamoudi
>>
>>Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI
>>(UC Irvine) and UCR (UC Riverside) affiliates being students. The
>>non-UC committers are a mix of alumni who continue to contribute to
>>the effort and individuals working with permission part-time (or in
>>spare time) on this project.
>>
>>
>>Sponsors
>>
>>Champion
>>
>>Chris Mattmann (NASA/JPL)
>>
>>Nominated Mentors
>>
>>TBD
>>
>>Sponsoring Entity
>>
>>The Apache Incubator
>>
>>
>>
>>
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Chief Architect
>>Instrument Software and Science Data Systems Section (398)
>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 168-519, Mailstop: 168-527
>>Email: chris.a.mattmann@nasa.gov
>>WWW:  http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Adjunct Associate Professor, Computer Science Department
>>University of Southern California, Los Angeles, CA 90089 USA
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message