incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Han <luke...@gmail.com>
Subject Re: [VOTE] Accept Concerted into the Apache Incubator
Date Sun, 11 Oct 2015 23:59:48 GMT
+1 (non-binding)


Best Regards!
---------------------

Luke Han

On Mon, Oct 12, 2015 at 4:33 AM, Alan D. Cabrera <list@toolazydogs.com>
wrote:

> +1 - binding
>
>
> Regards,
> Alan
>
> > On Oct 9, 2015, at 8:55 AM, Atri Sharma <atri@apache.org> wrote:
> >
> > Hi all,
> >
> > Following the discussion about Concerted I would like to call a vote for
> > accepting Concerted as a new incubator project.
> >
> > The proposal text is included below, and available on the wiki:
> >
> > https://wiki.apache.org/incubator/ConcertedProposal
> >
> > The vote is open for 72 hours:
> >
> > [ ] +1 accept Concerted in the Incubator
> > [ ] ±0
> > [ ] -1 (please give reason)
> >
> > Regards,
> >
> > Atri
> >
> > = Abstract =
> >
> > Concerted is an in memory write less read more engine aimed to provide
> > extreme read performance with very high degree of concurrency and
> > scalability and focus on minimizing own resource footprint.
> >
> > = Proposal =
> > Concerted is built on the principal that a new type of workload is
> > dominating the scene and is now needed to be supported. These are the
> large
> > data set analytical workloads being analyzed or used on large clusters or
> > high power machines. Large analytical workloads depend on the ability to
> > query large data sets efficiently and in high concurrency while
> maintaining
> > semantics such as immediate consistency. An in memory engine designed to
> > support extreme read queries while providing support for aggregation
> > through various features (such as multidimensional representation of
> > tuples) will accelerate many usecases around large scale analytics.
> >
> > Concerted believes that best understanding of user application lies with
> > user application developer. The need for massive read scaling should be
> on
> > demand and should be flexible to the level that user can decide as to
> which
> > representation and access of data suits his/her current requirements.
> > Hence, Concerted is not built in a traditional client/server model.
> > Concerted provides users with an API which can be used to load, read,
> > update and delete data. User chooses which data structure has to be used
> > for his current requirements. All API access is covered by Concerted's
> > internal systems like lock manager, transaction manager and cache manager
> > which ensure that reads scale to high level in every API call.
> >
> > Concerted is a Do It Yourself in memory platform for making in memory
> > supporting engines. The use case we think of is supporting big data
> > warehouses like Hive, but there are endless use cases for a custom,
> highly
> > scalable in memory platform.
> >
> > The goal of this proposal is to leverage an existing code base available
> on
> > Github and licensed under the Apache License 2.0 to build a community
> > around the project. Currently the community consists of existing hackers
> of
> > Concerted as well as people who have been following and associated with
> the
> > project since a while as well as database experts who are excited about
> > building a project like this. We are hoping that entering into Apache
> would
> > help us attract more contributors as well as connect with existing big
> data
> > projects like Apache Hive, Apache HAWQ, Apache Storm, Apache Tajo, Apache
> > Spark, Apache Geode to leverage their community base while assisting in
> > their use cases with Concerted. We had a discussion with founders of
> Apache
> > Tajo and they showed interest in using Concerted for some of their use
> > cases.
> > = Background =
> > Relational databases were built with the cost of physical memory in mind.
> > The cost is no longer very relevant and physical memory is now available
> on
> > demand. Another driving factor behind Concerted is that there is a
> paradigm
> > shift with big data coming into picture. Disk IO speeds are more of a
> > bottleneck than ever before. Combining the read dominance of analytical
> > workload with the speed of in memory structures, Concerted fits the
> current
> > scene. Also, supporting OLAP workloads with in memory support for faster
> > read constant queries and joins will be useful.
> >
> > = Rationale =
> > As explained above, large analytical workloads need an in memory
> > lightweight engine which supports massive read concurrency, ground level
> > support for aggregations and analytics, extreme scalability and high read
> > performance, along with the engine being very light itself. Concerted
> aims
> > to solve these needs. Concerted is designed and built with three goals as
> > objectives:
> >
> >
> > Performance
> >    To provide high performance access to data from a large number of
> rows,
> > Concerted uses efficient representation and in memory indexing of data
> > coupled with high performance transactions, custom transactions and
> > lightweight locking and lockless techniques and an intelligent locking
> > manager.
> >
> > Scalability
> >    Concerted is built with extreme concurrency and scalability in mind.
> >
> > Efficiency
> >    Concerted aims to give expected performance under vast variety of
> > workloads and aims to have as low footprint as possible.
> >
> > = Initial Goals =
> > The initial goal is to leverage an existing code base and invest in
> > building a community around the project. We anticipate a lot of initial
> > restructuring of the existing code so that it becomes easier to include
> new
> > contributors and minimize ramp up time. We plan to approach this
> > refactoring in a fully transparent, community-driven way thus starting to
> > practice the "Apache Way" governance model from the get go.
> >
> > Various contributors are getting individual changes into branches in
> github
> > repository and our initial major goal will be to merge in all those
> changes
> > in master repository.
> >
> > = Current Status =
> > Concerted is currently under restructuring to suit the needs of an open
> > source project. Current source is available at
> > https://github.com/atris/Concerted (Please note that updated codebase is
> > not yet present on github) Concerted is currently being licensed under
> > Apache License 2.0. Most of the code base is implemented in C and C++ and
> > has external dependencies listed later.
> >
> > == Meritocracy ==
> >
> > We plan to drive the technical roadmap and implementation in a fully
> > transparent, community-driven way soliciting feedback from all of the
> > community members and building a consensus-driven approach to evolving
> the
> > code base and the community itself. Users and new contributors will be
> > treated with respect and welcomed. By participating in the community and
> > providing quality patches/support that move the project forward,
> > contributors will earn merit. They also will be encouraged to provide
> > non-code contributions (documentation, events, community management,
> etc.)
> > and will gain merit for doing so. Those with a proven support and quality
> > track record will be encouraged to become committers.
> >
> > == Community ==
> > In memory is the new cutting edge thing and a new community around
> > performance oriented systems and enhancing relational database
> performance
> > by having complete in memory OLTP engines will greatly benefit
> performance.
> > So we expect data warehousing projects and communities as well as
> projects
> > and companies looking for high performance OLTP performance. In addition,
> > Ingenium Data Systems is building products around Concerted and will have
> > salaried developers contribute to the project as part of job
> responsibility.
> >
> > == Core Developers ==
> > Core developers are a diverse group of developers, many of which are very
> > experienced in open source and the Apache Hadoop ecosystem. Specifically,
> > Atri is an Apache Apex committer and Atri and Pavel are major
> contributors
> > to PostgreSQL project.Atri is also committer for other open source
> projects.
> >
> > * Amrish <amrishs AT ingeniumsys DOT com>
> > * Nupur S <nupurs AT ingeniumsys DOT com>
> > * Pavel Stehule <pavel DOT stehule AT gmail.com>
> > * Atri Sharma <atri AT apache DOT org>
> > * Nishith Singhal <nishsinghal AT gmail DOT com>
> > * Michael Down <michael AT dowuk DOT com>
> > * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com>
> > * Wang Albert <albertwang87 AT gmail DOT com>
> > * Hans-Jurgen Schonig <postgres AT cybertec DOT at>
> > * Kris Popat <krispopat AT apache DOT org>
> > * Ayrton Gomesz <com DOT ayrton AT gmail DOT com>
> >
> > == Alignment ==
> > Concerted will be helpful to systems like Tajo which can benefit with in
> > memory structures optimized for heavy reads and joins (dimension tables).
> > In addition Concerted will benefit projects looking for in memory
> > relational database as a metadata store, which is the case for most of
> the
> > Apache Big Data projects. We expect Apache HAWQ (incubating), Apache
> Hive,
> > Apache Storm, Apache Tajo to be utilizing Concerted as a supporting
> engine.
> > For eg, a data warehouse built on HAWQ, Hive or Tajo can utilize
> Concerted
> > as an in memory engine for querying and joining dimensional tables.
> >
> > = Known Risks =
> >
> > == Orphaned Products ==
> > Most of the code is developed by a small group of core developers and
> this
> > may be a risk for orphaned product. However, the code base is simple as
> > compared to other open source projects and the interest level in
> Concerted
> > has risen exponentially over the years with many computer professionals
> > expressing interest in the project and doing some use cases of the
> > same.Specifically, there were some projects done around Concerted in
> JIIT,
> > Noida (an engineering school) and Wang is a student in Lehigh University
> > who has been following Concerted's progress over many years. The core
> > developers are aligned with this project and since the code base is
> simple,
> > future committers will have a quick ramp up and the risk shall be
> > mitigated. Besides, Ingenium Data Systems is launching a product based on
> > Concerted and will be having all its salaried developers contribute to
> > Concerted as a part of their job functions.
> >
> > == Inexperience with Open Source ==
> > Most of the initial committers have experience working on open source
> > projects. In particular, Atri is an active member of many open source
> > projects.
> >
> > == Homogeneous Developers ==
> > Although initial core developers were based out of India, community now
> > consists of computer professionals from various parts of the world hence
> > diversity should not be an issue. In addition, we will be documenting
> > internals of the project in public facing documents and it shall allow
> more
> > contributors to join in.
> >
> > == Reliance on Salaried Developers ==
> > It is expected that Concerted development will occur on both salaried
> time
> > and on volunteer time. Nupur and Amrish belong to Ingenium and are
> > committed to building this project along with their team. Atri, as the
> > originator of this project, will be actively working on the project and
> is
> > now pushing Concerted into major data warehousing projects, since he is
> > involved in architecture of data platforms. Developers are expected to be
> > contributing in their volunteer time. In addition, we will be working
> with
> > various open source projects which will be benefited by Concerted and
> will
> > be involving those communities into Concerted's development as well. For
> > eg, Apache Tajo has shown interest and will be supporting development of
> > the project.
> >
> > == Relationships with Other Apache Products ==
> > Concerted has some overlapping function with Apache Geode(Incubating).
> > However, Geode is an in memory key value store whereas Concerted is a
> write
> > less read many engine. Concerted will complement Geode and increase the
> use
> > cases Geode can support with Concerted's help.
> >
> > A major objective for Concerted is supporting OLAP workloads and data
> > warehouses with in memory performance and highly performant reads and
> > joins. Concerted will be collaborating with many open source projects
> such
> > as Apache HAWQ (incubating), Apache Hive, Apache Tajo etc to support
> their
> > OLAP workloads hence enabling them to support larger set of usecases
> with a
> > better throughput. For eg, a star schema in Hive will benefit from having
> > dimension tables in Concerted with highly efficient and scalable reads
> and
> > joins will be very fast. Similar workload for Tajo.
> >
> > Concerted will fit in many other use cases in Apache spectrum as well.
> For
> > eg, Concerted can be used with Apache Geode for in memory aggregation
> > indexing. Concerted can also be used with Apache Flink for streaming real
> > time data into in memory, perform in memory aggregation and then
> performing
> > batch processing for efficiency.
> >
> >
> > == A Excessive Fascination with the Apache Brand ==
> > We believe that the "Apache Way" governance model will provide additional
> > help to us in finding contributors and growing the community. The
> community
> > and development process will make this project more stable and help
> > establish ubiquitous APIs. In addition, Concerted is looking to support
> > multiple Apache projects in their use cases and accelerate their
> > performance while soliciting their support in development of the project.
> > We will not be using Apache brand for excessive branding or with any
> > commercial aspects of Concerted. Apache brand will primarily be used for
> > community building.
> >
> > = Documentation =
> > Public documents are currently in development and will be published soon.
> >
> > = Initial Source =
> > The initial source is written in C++ and is heavily in development. It
> will
> > be restructured and released publicly.
> > We understand that there might be concerns around github source being
> > developed by only a single person and development not happening after
> 2013.
> > The source on github is only the source initially developed as an
> > independent project hence the limitation. However, due to reason that
> > project has been present on github for a while now, it has attracted
> > attention and people have been using and developing it locally. For eg,
> > Ingenium Data System took an interest in the project and locally
> developed
> > it and used it in an upcoming product they are going to release soon. The
> > project now wants to accumulate all independent development efforts and
> > help attract people to grow the community and project. We are currently
> in
> > process of updating github repository and making branches for all local
> > development efforts.
> >
> > = Source and Intellectual Property Submission Plan =
> >
> > We intend the entire code base to be licensed under the Apache License,
> > Version 2.0.
> >
> > = External Dependencies =
> > Currently, Concerted only depends on g++ compiler and pthreads. pthreads
> > will be replaced by Boost in next release.
> >
> > = Cryptography =
> >
> > N/A
> >
> > = Required Resources =
> > == Mailling List ==
> > *private@concerted.incubator.apache.org (moderated subscriptions)
> > *commits@concerted.incubator.apache.org
> > *dev@concerted.incubator.apache.org
> > *issues@concerted.incubator.apache.org
> >
> > == Git Repository ==
> >
> > https://git-wip-us.apache.org/repos/asf/incubator-concerted.git
> >
> > == Issue Tracking ==
> > Jira Concerted (CONCERTED)
> >
> > == Other Resources ==
> > * Continuous Integration
> >  * Jenkins
> > * Wiki
> >  * cwiki.apache.org/confluence/display/CONCERTED
> >
> > = Initial Committers =
> > * Roman Shaposhnik <rvs AT apache DOT org>
> > * Daniel Dai <daijy AT apache DOT org>
> > * Jake Farrell <jfarrell AT apache DOT org>
> > * Lars Hofhansl <larsh AT apache DOT org>
> > * Julian Hyde <jhyde AT apache DOT org>
> > * Chris Nauroth <cnauroth AT hortonworks DOT com>
> > * Pavel Stehule <pavel DOT stehule AT gmail.com>
> > * Amrish <amrishs AT ingeniumsys DOT com>
> > * Nupur S <nupurs AT ingeniumsys DOT com>
> > * Atri Sharma <atri AT apache DOT org>
> > * Nishith Singhal <nishsinghal AT gmail DOT com>
> > * Michael Down <michael AT dowuk DOT com>
> > * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com>
> > * Wang Albert <albertwang87 AT gmail DOT com>
> > * Hans-Jurgen Schonig <postgres AT cybertec DOT at>
> > * Kris Popat <krispopat AT apache DOT org>
> > * Ayrton Gomesz <com DOT ayrton AT gmail DOT com>
> >
> > = Affiliations =
> > * Roman Shaposhnik (Pivotal)
> > * Daniel Dai (HortonWorks)
> > * Jake Farrell (Acquia)
> > * Lars Hofhansl (Salesforce)
> > * Julian Hyde (HortonWorks)
> > * Chris Nauroth (HortonWorks)
> > * Pavel Stehule (GoodData)
> > * Amrish (Ingenium Data Systems)
> > * Nupur S (Ingenium Data Systems)
> > * Atri Sharma (Barclays)
> > * Nishith Singhal (Wipro)
> > * Michael Down (Barclays)
> > * Vijayakumar Ramdoss (EMC)
> > * Wang Albert (Lehigh University)
> > * Hans- Jurgen Schonig (CyberTec)
> > * Kris Popat (CETIS LLP)
> > * Ayrton Gomesz (IQLabs)
> >
> > The nominated mentors are employees of HortonWorks, Acquia, and
> Salesforce.
> >
> > * Daniel Dai (HortonWorks)
> > * Jake Farrell (Acquia)
> > * Lars Hofhansl (Salesforce)
> > * Julian Hyde (HortonWorks)
> > * Chris Nauroth (HortonWorks)
> >
> > = Sponsors =
> >
> > == Champion ==
> >
> > * Roman Shaposhnik (rvs AT apache DOT org)
> >
> > == Nominated Mentors ==
> >
> > * Daniel Dai <daijy AT apache DOT org>
> > * Jake Farrell <jfarrell AT apache DOT org>
> > * Lars Hofhansl <larsh AT apache DOT org>
> > * Julian Hyde <jhyde AT apache DOT org>
> > * Chris Nauroth <cnauroth AT hortonworks DOT com>
> >
> > == Sponsoring Entity ==
> > Apache Incubator
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message