incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: [VOTE] Accept Apache AsterixDB in to the Incubator
Date Tue, 24 Feb 2015 01:57:25 GMT
Thx!

On 2/23/15 4:26 PM, Mattmann, Chris A (3980) wrote:
> Thank you Ate! I have added you as a mentor on the proposal!
>
> https://wiki.apache.org/incubator/AsterixDBProposal
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Ate Douma <ate@douma.nu>
> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org>
> Date: Monday, February 23, 2015 at 6:47 AM
> To: "general@incubator.apache.org" <general@incubator.apache.org>
> Cc: Michael Carey <dtabass@gmail.com>, Ian Maxon <imaxon@uci.edu>, Till
> Westmann <till@westmann.org>
> Subject: Re: [VOTE] Accept Apache AsterixDB in to the Incubator
>
>> +1 (binding)
>>
>> Very interesting.
>> And if you still like or need another mentor, I'd be willing to help out.
>>
>> Ate
>>
>> On 2015-02-20 06:38, Mattmann, Chris A (3980) wrote:
>>> Hi Everyone,
>>>
>>> OK, discussion has died down on this thread. I was originally
>>> suggesting that the pTLP option may be best for this community,
>>> but after some discussions with the existing community of
>>> AsterixDB’ers proposing to bring the project here to the ASF,
>>> AsterixDB would like to move forward independent of whatever
>>> comes of the pTLP discussions.
>>>
>>> That said, I would like to propose Apache AsterixDB as an
>>> Incubator project. I am now calling a VOTE to accept AsterixDB
>>> into the Apache Incubator. This VOTE will run for at least 72 hours.
>>>
>>> [ ] +1 Accept Apache AsterixDB into the Incubator
>>> [ ] +0 Don’t care.
>>> [ ] -1 Don’t accept Apache AsterixDB into the Incubator because..
>>>
>>> Thanks for the feedback so far and looking forward to the VOTE!
>>>
>>> You can count my binding +1.
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: <Mattmann>, Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>
>>> Date: Wednesday, January 14, 2015 at 6:20 PM
>>> To: "general@incubator.apache.org" <general@incubator.apache.org>
>>> Cc: Michael Carey <dtabass@gmail.com>, Ian Maxon <imaxon@uci.edu>,
Till
>>> Westmann <till@westmann.org>
>>> Subject: [PROPOSAL] Apache AsterixDB Incubator
>>>
>>>> Hi Folks,
>>>>
>>>> I am pleased to bring forth the Apache AsterixDB proposal to the
>>>> Apache Incubator as Champion, working in collaboration with the
>>>> team. Please find the wiki proposal here:
>>>>
>>>> https://wiki.apache.org/incubator/AsterixDBProposal
>>>>
>>>>
>>>> Full text of the proposal is below. Please discuss and enjoy. I’ll
>>>> leave the discussion open for a week, and then look to call a VOTE
>>>> hopefully end of next week if all is well.
>>>>
>>>> Cheers!
>>>> Chris Mattmann
>>>>
>>>> =============================================================
>>>> Apache AsterixDB Proposal
>>>>
>>>> Abstract
>>>>
>>>> Apache AsterixDB is a scalable big data management system (BDMS) that
>>>> provides storage, management, and query capabilities for large
>>>> collections of semi-structured data.
>>>>
>>>> Proposal
>>>>
>>>> AsterixDB is a big data management system (BDMS) that makes it
>>>> well-suited to needs such as web data warehousing and social data
>>>> storage and analysis. Feature-wise, AsterixDB has:
>>>>
>>>> * A NoSQL style data model (ADM) based on extending JSON with object
>>>>    database concepts.
>>>> * An expressive and declarative query language (AQL) for querying
>>>>    semi-structured data.
>>>> * A runtime query execution engine, Hyracks, for partitioned-parallel
>>>>    execution of query plans.
>>>> * Partitioned LSM-based data storage and indexing for efficient
>>>>    ingestion of newly arriving data.
>>>> * Support for querying and indexing external data (e.g., in HDFS) as
>>>>    well as data stored within AsterixDB.
>>>> * A rich set of primitive data types, including support for spatial,
>>>>    temporal, and textual data.
>>>> * Indexing options that include B+ trees, R trees, and inverted
>>>>    keyword index support.
>>>> * Basic transactional (concurrency and recovery) capabilities akin to
>>>>    those of a NoSQL store.
>>>>
>>>>
>>>> Background and Rationale
>>>>
>>>> In the world of relational databases, the need to tackle data volumes
>>>> that exceed the capabilities of a single server led to the
>>>> development of “shared-nothing” parallel database systems several
>>>> decades ago. These systems spread data over a cluster based on a
>>>> partitioning strategy, such as hash partitioning, and queries are
>>>> processed by employing partitioned-parallel divide-and-conquer
>>>> techniques. Since these systems are fronted by a high-level,
>>>> declarative language (SQL), their users are shielded from the
>>>> complexities of parallel programming. Parallel database systems have
>>>> been an extremely successful application of parallel computing, and
>>>> quite a number of commercial products exist today.
>>>>
>>>> In the distributed systems world, the Web brought a need to index and
>>>> query its huge content. SQL and relational databases were not the
>>>> answer, though shared-nothing clusters again emerged as the hardware
>>>> platform of choice. Google developed the Google File System (GFS) and
>>>> MapReduce programming model to allow programmers to store and process
>>>> Big Data by writing a few user-defined functions. The MapReduce
>>>> framework applies these functions in parallel to data instances in
>>>> distributed files (map) and to sorted groups of instances sharing a
>>>> common key (reduce) -- not unlike the partitioned parallelism in
>>>> parallel database systems. Apache's Hadoop MapReduce platform is the
>>>> most prominent implementation of this paradigm for the rest of the
>>>> Big Data community. On top of Hadoop and HDFS sit declarative
>>>> languages like Pig and Hive that each compile down to Hadoop
>>>> MapReduce jobs.
>>>>
>>>> The big Web companies were also challenged by extreme user bases
>>>> (100s of millions of users) and needed fast simple lookups and
>>>> updates to very large keyed data sets like user profiles. SQL
>>>> databases were deemed either too expensive or not scalable, so the
>>>> “NoSQL movement” was born. The ASF now has HBase and Cassandra, two
>>>> popular key-value stores, in this space. MongoDB and Couchbase are
>>>> other open source alternatives (document stores).
>>>>
>>>> It is evident from the rapidly growing popularity of "NoSQL" stores,
>>>> as well as the strong demand for Big Data analytics engines today,
>>>> that there is a strong (and growing!) need to store, process, *and*
>>>> query large volumes of semi-structured data in many application
>>>> areas. Until very recently, developers have had to ``choose'' between
>>>> using big data analytics engines like Apache Hive or Apache Spark,
>>>> which can do complex query processing and analysis over HDFS-resident
>>>> files, and flexible but low-function data stores like MongoDB or
>>>> Apache HBase. (The Apache Phoenix project,
>>>> http://phoenix.apache.org/, is a recent SQL-over-HBase effort that
>>>> aims to bridge between these choices.)
>>>>
>>>> AsterixDB is a highly scalable data management system that can store,
>>>> index, and manage semi-structured data, e.g., much like MongoDB, but
>>>> it also supports a full-power query language with the expressiveness
>>>> of SQL (and more). Unlike analytics engines like Hive or Spark, it
>>>> stores and manages data, so AsterixDB can exploit its knowledge of
>>>> data partitioning and the availability of indexes to avoid always
>>>> scanning data set(s) to process queries. Somewhat surprisingly, there
>>>> is no open source parallel database system (relational or otherwise)
>>>> available to developers today -- AsterixDB aims to fill this need.
>>>> Since Apache is where the majority of the today's most important Big
>>>> Data technologies live, the ASF seems like the obvious home for a
>>>> system like AsterixDB.
>>>>
>>>> Current Status
>>>>
>>>> The current version of AsterixDB was co-developed by a team of
>>>> faculty, staff, and students at UC Irvine and UC Riverside. The
>>>> project was initiated as a large NSF-sponsored project in 2009, the
>>>> goal of which was to combine the best ideas from the parallel
>>>> database world, the then new Hadoop world, and the semi-structured
>>>> (e.g., XML/JSON) data world in order to create a next-generation
>>>> BDMS. A first informal open source release was made four years later,
>>>> in June of 2013, under the Apache Software License 2.0.
>>>>
>>>>
>>>> Meritocracy
>>>>
>>>> The current developers are familiar with meritocratic open source
>>>> development at Apache. Apache was chosen specifically because we want
>>>> to encourage this style of development for the project.
>>>>
>>>>
>>>> Community
>>>>
>>>> While AsterixDB started as a university project it has developed into
>>>> a community. A number of the initial committers started contributing
>>>> in academia and continue to actively participate and contribute after
>>>> graduation. And we seek to further develop developer and user
>>>> communities. One way to broaden the community that is ongoing is
>>>> through academic collaborations (currently with IIT Mumbai in India
>>>> and TU Berlin in Germany). During incubation we will also explicitly
>>>> seek increased industrial participation.
>>>>
>>>> Some indicators of the effort's development community and history can
>>>> be
>>>> found at:
>>>>
>>>> https://www.openhub.net/p/asterixdb/contributors?query=&sort=commits_12_
>>>> mo
>>>> ,
>>>>
>>>> https://www.openhub.net/p/hyracks/contributors?query=&sort=commits_12_mo
>>>>
>>>>
>>>> Core Developers
>>>>
>>>> The core developers of the project are diverse, although initially UC
>>>> Irvine heavy (roughly 50) due to the project's origins at UCI. The
>>>> other 50 are from other academic institutions (UC Riverside and the
>>>> Hebrew University in Jerusalem) and companies (Couchbase, Facebook,
>>>> IBM, KACST Saudi Arabia, Oracle, Saudi Aramco, X15 Software).
>>>>
>>>>
>>>> Alignment
>>>>
>>>> Apache is, by far, the most natural home for taking the AsterixDB
>>>> project forward. A large fraction of today's top Big Data
>>>> technologies have their homes in Apache, including Hadoop, YARN, Pig,
>>>> Hive, Spark, Flink, HBase, Cassandra and others. AsterixDB fills a
>>>> significant gap -- the parallel data management system gap -- that
>>>> exists in the Big Data open source world. It is well-aligned with a
>>>> number of the Apache projects, e.g., it has strong support for
>>>> accessing and indexing external data in HDFS, and it uses YARN as an
>>>> answer to basic cluster resource management. AsterixDB also seeks to
>>>> achieve an Apache-style development model; it is seeking a broader
>>>> community of contributors and users in order to achieve its full
>>>> potential and value to the Big Data community.
>>>>
>>>> There are also a number of related Apache projects and dependencies
>>>> that will be mentioned below in the Relationships with Other Apache
>>>> products section.
>>>>
>>>>
>>>> Known Risks
>>>>
>>>> Orphaned products
>>>>
>>>> Given the current level of intellectual investment in AsterixDB, the
>>>> risk of the project being abandoned is very small. The UCI/UCR
>>>> faculty team leads are highly incentivized to continue development
>>>> since the database groups at UC Irvine and UC Riverside are both
>>>> reliant on AsterixDB as a platform for long-term graduate research
>>>> projects. UC San Diego is also beginning to contribute to the code
>>>> base, and a collaboration involving public health applications is
>>>> forming with UCLA. The work on AsterixDB is managed via a mix of
>>>> mailing list discussions supplemented by weekly project status
>>>> meetings which are summarized on the mailing list. Typical (local
>>>> plus Skype-in) attendance to the weekly status meetings runs at about
>>>> 20 active contributors.
>>>>
>>>>
>>>> Inexperience with Open Source
>>>>
>>>> AsterixDB and Hyracks were completely developed in Open Source under
>>>> the ASL 2.0. The source code repositories, issue tracker, and mailing
>>>> lists are available on Google Code and discussions and decisions
>>>> happen on the mailing lists (which is necessary due to the geographic
>>>> distribution of the current developers).
>>>>
>>>> Also a few of the initial committers have contributed to Apache
>>>> projects. Vinayak Borkar is a committer on the Apache Helix and
>>>> Apache VXQuery projects. Till Westmann is the VP VXQuery at the ASF
>>>> and an IPMC member. Preston Carman and Steven Jacobs are committers
>>>> on the Apache VXQuery project.
>>>>
>>>>
>>>> Relationships with Other Apache Products
>>>>
>>>> Apache VXQuery is based on the Hyracks data-parallel runtime, which
>>>> is also included in the AsterixDB code base.
>>>>
>>>> AsterixDB is closely related to Apache Hadoop. Included in AsterixDB
>>>> is support for accessing external data in HDFS (and Hive formats),
>>>> and resource management and system administration features are in the
>>>> process of being migrated to YARN.
>>>>
>>>> AsterixDB's AQL query facilities offer comparable query power to
>>>> Apache's Pig and Hive systems for big data analytics. AsterixDB
>>>> differs in storing and indexing data and thus being able to quickly
>>>> answer small and medium queries without large HDFS data scans -
>>>> thereby targeting a different class of use cases.
>>>>
>>>> AsterixDB's data storage and indexing facilities are similar to those
>>>> of HBase, but AsterixDB differs in being a much more complete and
>>>> queryable BDMS (not just a key-value style store).
>>>>
>>>> AsterixDB's target use cases are not in-memory processing or
>>>> iterative algorithm support, making AsterixDB complementary to the
>>>> Apache Spark platform. (Spark interoperability is on our longer-term
>>>> to-do wishlist.)
>>>>
>>>>
>>>> Homogeneous Developers
>>>>
>>>> As mentioned before the current community is already organizationally
>>>> and geographically distributed - and we would like to increase the
>>>> heterogeneity.
>>>>
>>>>
>>>> Reliance on Salaried Developers
>>>>
>>>> Of the initial committers only 3 are full-time UCI staff. The other
>>>> committers are a mix of students, alumni who continue to contribute
>>>> to the effort, and individuals working with permission part-time (or
>>>> in spare time) on this project.
>>>>
>>>>
>>>> A Excessive Fascination with the Apache Brand
>>>>
>>>> We believe in the processes, systems, and framework Apache has put in
>>>> place. Apache is also known to foster a great community around their
>>>> projects and provide exposure. While brand is important, our
>>>> fascination with it is not excessive. We believe that the ASF is the
>>>> right home for AsterixDB and that having AsterixDB inside of the ASF
>>>> will lead to a better long-term outcome for the Big Data community.
>>>>
>>>>
>>>> Documentation
>>>>
>>>> Documentation and publications related to AsterixDB can be found at
>>>> http://asterixdb.ics.uci.edu/.
>>>>
>>>>
>>>> Initial Source
>>>>
>>>> Current source resides in Google code:
>>>> https://code.google.com/p/asterixdb/ (query language and upper system
>>>> layers) and https://code.google.com/p/hyracks/ (dataflow runtime
>>>> system and storage management libraries).
>>>>
>>>>
>>>> External Dependencies
>>>>
>>>> AsterixDB depends on a number of Apache projects:
>>>>
>>>> - Ant
>>>> - Avro
>>>> - ApacheDB JDO
>>>> - Commons
>>>> - Derby
>>>> - Hadoop
>>>> - Hive
>>>> - HTTPComponents
>>>> - Jakarta ORO
>>>> - Maven
>>>> - Tomcat
>>>> - Thrift
>>>> - Velocity
>>>> - Wicket
>>>> - Xerces
>>>>
>>>> and other open source projects (organized by license):
>>>>
>>>> -- ASL 2.0:
>>>> - Jackson
>>>> - Google Guava
>>>> - Google Guice
>>>> - JSON-simple
>>>> - BoneCP
>>>> - Microsoft Azure SDK
>>>> - Netty
>>>> - Rome
>>>> - JetS3t
>>>> - Groovy
>>>> - Jettison
>>>> - Plexus
>>>> - Datanucleus (JDO)
>>>> - Jetty
>>>> - Twitter4J
>>>> - Snappy-java
>>>>
>>>> -- BSD:
>>>> - Antlr
>>>> - ObjectWeb ASM
>>>> - Protobuf
>>>> - JSCH
>>>> - JavaCC
>>>> - Paranamer
>>>> - JLine
>>>> - Stax
>>>> - StringTemplate
>>>> - xmlEnc
>>>>
>>>> -- MIT
>>>> - AppAssembler
>>>> - SimpleLog4J
>>>>
>>>> -- CDDL 1.0
>>>> - Java Activation Framework
>>>> - Java Transactions
>>>> - Java Servlet API
>>>> - Grizzly
>>>> - gmbal
>>>> - Glassfish
>>>>
>>>> -- CDDL 1.1
>>>> - Jersey
>>>> - JAXB Reference Implementation
>>>>
>>>> -- JSON License
>>>> - JSON
>>>>
>>>> -- EPL 1.0
>>>> - JUnit
>>>>
>>>> -- JDOM License
>>>> - JDOM
>>>>
>>>> -- Public Domain
>>>> - xz
>>>> - AOPAlliance
>>>>
>>>> As all dependencies are managed using Apache Maven, none of the
>>>> external libraries need to be packaged in a source distribution.
>>>>
>>>>
>>>> Required Resources
>>>>
>>>> Developer and user mailing lists
>>>>
>>>> private@asterixdb.incubator.apache.org (with moderated subscriptions)
>>>> commits@asterixdb.incubator.apache.org
>>>> dev@asterixdb.incubator.apache.org
>>>> users@asterixdb.incubator.apache.org
>>>>
>>>>
>>>> A git repository
>>>>
>>>> https://git-wip-us.apache.org/repos/asf/incubator-asterixdb.git
>>>>
>>>>
>>>> A JIRA issue tracker
>>>>
>>>> https://issues.apache.org/jira/browse/ASTERIXDB
>>>>
>>>>
>>>> Initial Committers
>>>>
>>>> The following is a list of the planned initial Apache committers (the
>>>> active subset of the committers for the current repository at Google
>>>> code).
>>>>
>>>> Abdullah Alamoudi (bamousaa@gmail.com)
>>>> Cameron Samak (eufery@gmail.com)
>>>> Chen Li (chenli@gmail.com)
>>>> Ian Maxon (imaxon@uci.edu)
>>>> Ildar Absalyamov (ildar.absalyamov@gmail.com)
>>>> Jianfeng Jia (jianfeng.jia@gmail.com)
>>>> Karen Ouaknine (kereno@gmail.com)
>>>> Markus Dreseler (apache@dreseler.de)
>>>> Mike Carey (dtabass@apache.org)
>>>> Murtadha Hubail (hubailmor@gmail.com)
>>>> Pouria Pirzadeh (pouria.pirzadeh@gmail.com)
>>>> Preston Carman (prestonc@apache.org)
>>>> Raman Grover (RamanGrover29@gmail.com)
>>>> Sattam Alsubaiee (salsubaiee@gmail.com)
>>>> Steven Jacobs (sjaco002@apache.org)
>>>> Taewoo Kim (wangsaeu@gmail.com)
>>>> Till Westmann (tillw@apache.org)
>>>> Vinayak Borkar (vinayakb@apache.org)
>>>> Yingyi Bu (buyingyi@gmail.com)
>>>> Young-Seok Kim (kisskys@gmail.com)
>>>> Zach Heilbron (zheilbron@gmail.com)
>>>>
>>>>
>>>> Affiliations
>>>>
>>>> UC Irvine
>>>> - Mike Carey
>>>> - Chen Li
>>>> - Ian Maxon
>>>> - Yingyi Bu
>>>> - Raman Grover
>>>> - Pouria Pirzadeh
>>>> - Young-Seok Kim
>>>> - Cameron Samak
>>>> - Taewoo Kim
>>>> - Jianfeng Jia
>>>> - Murtadha Hubail
>>>> - Markus Dreseler
>>>>
>>>> UC Riverside
>>>> - Ildar Absalyamov
>>>> - Preston Carman
>>>> - Steven Jacobs
>>>>
>>>> Hebrew University
>>>> - Keren Ouaknine
>>>>
>>>> Oracle
>>>> - Till Westmann
>>>>
>>>> X15 Software
>>>> - Vinayak Borkar
>>>> - Zach Heilbron
>>>>
>>>> KACST Saudi Arabia
>>>> - Sattam Alsubaiee
>>>>
>>>> Saudi Aramco
>>>> - Abdullah Alamoudi
>>>>
>>>> Carey, Li, and Maxon are full-time UCI staff, with the remaining UCI
>>>> (UC Irvine) and UCR (UC Riverside) affiliates being students. The
>>>> non-UC committers are a mix of alumni who continue to contribute to
>>>> the effort and individuals working with permission part-time (or in
>>>> spare time) on this project.
>>>>
>>>>
>>>> Sponsors
>>>>
>>>> Champion
>>>>
>>>> Chris Mattmann (NASA/JPL)
>>>>
>>>> Nominated Mentors
>>>>
>>>> TBD
>>>>
>>>> Sponsoring Entity
>>>>
>>>> The Apache Incubator
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398)
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Associate Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message