incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: [VOTE] Accept Concerted into the Apache Incubator
Date Fri, 09 Oct 2015 21:12:36 GMT
Thanks for clarifying.

+1 (binding)

Julian


On Fri, Oct 9, 2015 at 2:09 PM, Atri Sharma <atri.jiit@gmail.com> wrote:
> Hi,
>
> Please find answers below:
>
> 1) The main source code on Github wasn't updated for a while. However, the
> original and main core was written in 2013 and has been open source since
> then. As we discussed earlier current code base is only starting point for
> complete development and will be first integrated with silo work done
> independent and then used as starting implementation.
>
> 2) The JNI native API when optimized can provide great performance ( I have
> written an application using it and it is on production systems for many
> years). I think we can still provide a high performance API to the C++ core
> and that is something I am personally working on right now.
> On 10 Oct 2015 02:31, "Julian Hyde" <jhyde@apache.org> wrote:
>
>> I have agreed to be a mentor to Concerted and I think it is an
>> interesting idea. I am inclined to vote for it entering the incubator.
>>
>> However since the project has not released any source code yet, there
>> are a couple of questions I'd like to get answered for the record:
>>
>> 1. How many lines of existing code are there? What is their approximate
>> age?
>>
>> 2. Concerted is in C/C++ but you mention interfacing with JVM-based
>> products like Hive. How you would interface with other languages? Is
>> it a goal of the project to create APIs to other languages such as
>> Java? Would access from those languages be as efficient as native
>> access?
>>
>> I apologize that I didn't bring these up in the discussion thread.
>>
>> Julian
>>
>>
>> On Fri, Oct 9, 2015 at 11:53 AM, Ayrton Gomesz <com.ayrton@gmail.com>
>> wrote:
>> > +1
>> > @henry.saputra thanks man
>> > On Oct 9, 2015 5:50 PM, "Henry Saputra" <henry.saputra@gmail.com> wrote:
>> >
>> >> +1 (binding)
>> >> Good luck guys!
>> >>
>> >> On Fri, Oct 9, 2015 at 8:55 AM, Atri Sharma <atri@apache.org> wrote:
>> >> > Hi all,
>> >> >
>> >> > Following the discussion about Concerted I would like to call a vote
>> for
>> >> > accepting Concerted as a new incubator project.
>> >> >
>> >> > The proposal text is included below, and available on the wiki:
>> >> >
>> >> > https://wiki.apache.org/incubator/ConcertedProposal
>> >> >
>> >> > The vote is open for 72 hours:
>> >> >
>> >> > [ ] +1 accept Concerted in the Incubator
>> >> > [ ] ±0
>> >> > [ ] -1 (please give reason)
>> >> >
>> >> > Regards,
>> >> >
>> >> > Atri
>> >> >
>> >> > = Abstract =
>> >> >
>> >> > Concerted is an in memory write less read more engine aimed to provide
>> >> > extreme read performance with very high degree of concurrency and
>> >> > scalability and focus on minimizing own resource footprint.
>> >> >
>> >> > = Proposal =
>> >> > Concerted is built on the principal that a new type of workload is
>> >> > dominating the scene and is now needed to be supported. These are the
>> >> large
>> >> > data set analytical workloads being analyzed or used on large
>> clusters or
>> >> > high power machines. Large analytical workloads depend on the ability
>> to
>> >> > query large data sets efficiently and in high concurrency while
>> >> maintaining
>> >> > semantics such as immediate consistency. An in memory engine designed
>> to
>> >> > support extreme read queries while providing support for aggregation
>> >> > through various features (such as multidimensional representation of
>> >> > tuples) will accelerate many usecases around large scale analytics.
>> >> >
>> >> > Concerted believes that best understanding of user application lies
>> with
>> >> > user application developer. The need for massive read scaling should
>> be
>> >> on
>> >> > demand and should be flexible to the level that user can decide as
to
>> >> which
>> >> > representation and access of data suits his/her current requirements.
>> >> > Hence, Concerted is not built in a traditional client/server model.
>> >> > Concerted provides users with an API which can be used to load, read,
>> >> > update and delete data. User chooses which data structure has to be
>> used
>> >> > for his current requirements. All API access is covered by Concerted's
>> >> > internal systems like lock manager, transaction manager and cache
>> manager
>> >> > which ensure that reads scale to high level in every API call.
>> >> >
>> >> > Concerted is a Do It Yourself in memory platform for making in memory
>> >> > supporting engines. The use case we think of is supporting big data
>> >> > warehouses like Hive, but there are endless use cases for a custom,
>> >> highly
>> >> > scalable in memory platform.
>> >> >
>> >> > The goal of this proposal is to leverage an existing code base
>> available
>> >> on
>> >> > Github and licensed under the Apache License 2.0 to build a community
>> >> > around the project. Currently the community consists of existing
>> hackers
>> >> of
>> >> > Concerted as well as people who have been following and associated
>> with
>> >> the
>> >> > project since a while as well as database experts who are excited
>> about
>> >> > building a project like this. We are hoping that entering into Apache
>> >> would
>> >> > help us attract more contributors as well as connect with existing
big
>> >> data
>> >> > projects like Apache Hive, Apache HAWQ, Apache Storm, Apache Tajo,
>> Apache
>> >> > Spark, Apache Geode to leverage their community base while assisting
>> in
>> >> > their use cases with Concerted. We had a discussion with founders of
>> >> Apache
>> >> > Tajo and they showed interest in using Concerted for some of their
use
>> >> > cases.
>> >> > = Background =
>> >> > Relational databases were built with the cost of physical memory in
>> mind.
>> >> > The cost is no longer very relevant and physical memory is now
>> available
>> >> on
>> >> > demand. Another driving factor behind Concerted is that there is a
>> >> paradigm
>> >> > shift with big data coming into picture. Disk IO speeds are more of
a
>> >> > bottleneck than ever before. Combining the read dominance of
>> analytical
>> >> > workload with the speed of in memory structures, Concerted fits the
>> >> current
>> >> > scene. Also, supporting OLAP workloads with in memory support for
>> faster
>> >> > read constant queries and joins will be useful.
>> >> >
>> >> > = Rationale =
>> >> > As explained above, large analytical workloads need an in memory
>> >> > lightweight engine which supports massive read concurrency, ground
>> level
>> >> > support for aggregations and analytics, extreme scalability and high
>> read
>> >> > performance, along with the engine being very light itself. Concerted
>> >> aims
>> >> > to solve these needs. Concerted is designed and built with three
>> goals as
>> >> > objectives:
>> >> >
>> >> >
>> >> > Performance
>> >> >     To provide high performance access to data from a large number
of
>> >> rows,
>> >> > Concerted uses efficient representation and in memory indexing of data
>> >> > coupled with high performance transactions, custom transactions and
>> >> > lightweight locking and lockless techniques and an intelligent locking
>> >> > manager.
>> >> >
>> >> > Scalability
>> >> >     Concerted is built with extreme concurrency and scalability in
>> mind.
>> >> >
>> >> > Efficiency
>> >> >     Concerted aims to give expected performance under vast variety
of
>> >> > workloads and aims to have as low footprint as possible.
>> >> >
>> >> > = Initial Goals =
>> >> > The initial goal is to leverage an existing code base and invest in
>> >> > building a community around the project. We anticipate a lot of
>> initial
>> >> > restructuring of the existing code so that it becomes easier to
>> include
>> >> new
>> >> > contributors and minimize ramp up time. We plan to approach this
>> >> > refactoring in a fully transparent, community-driven way thus
>> starting to
>> >> > practice the "Apache Way" governance model from the get go.
>> >> >
>> >> > Various contributors are getting individual changes into branches in
>> >> github
>> >> > repository and our initial major goal will be to merge in all those
>> >> changes
>> >> > in master repository.
>> >> >
>> >> > = Current Status =
>> >> > Concerted is currently under restructuring to suit the needs of an
>> open
>> >> > source project. Current source is available at
>> >> > https://github.com/atris/Concerted (Please note that updated
>> codebase is
>> >> > not yet present on github) Concerted is currently being licensed under
>> >> > Apache License 2.0. Most of the code base is implemented in C and C++
>> and
>> >> > has external dependencies listed later.
>> >> >
>> >> > == Meritocracy ==
>> >> >
>> >> > We plan to drive the technical roadmap and implementation in a fully
>> >> > transparent, community-driven way soliciting feedback from all of the
>> >> > community members and building a consensus-driven approach to evolving
>> >> the
>> >> > code base and the community itself. Users and new contributors will
be
>> >> > treated with respect and welcomed. By participating in the community
>> and
>> >> > providing quality patches/support that move the project forward,
>> >> > contributors will earn merit. They also will be encouraged to provide
>> >> > non-code contributions (documentation, events, community management,
>> >> etc.)
>> >> > and will gain merit for doing so. Those with a proven support and
>> quality
>> >> > track record will be encouraged to become committers.
>> >> >
>> >> > == Community ==
>> >> > In memory is the new cutting edge thing and a new community around
>> >> > performance oriented systems and enhancing relational database
>> >> performance
>> >> > by having complete in memory OLTP engines will greatly benefit
>> >> performance.
>> >> > So we expect data warehousing projects and communities as well as
>> >> projects
>> >> > and companies looking for high performance OLTP performance. In
>> addition,
>> >> > Ingenium Data Systems is building products around Concerted and will
>> have
>> >> > salaried developers contribute to the project as part of job
>> >> responsibility.
>> >> >
>> >> > == Core Developers ==
>> >> > Core developers are a diverse group of developers, many of which are
>> very
>> >> > experienced in open source and the Apache Hadoop ecosystem.
>> Specifically,
>> >> > Atri is an Apache Apex committer and Atri and Pavel are major
>> >> contributors
>> >> > to PostgreSQL project.Atri is also committer for other open source
>> >> projects.
>> >> >
>> >> >  * Amrish <amrishs AT ingeniumsys DOT com>
>> >> >  * Nupur S <nupurs AT ingeniumsys DOT com>
>> >> >  * Pavel Stehule <pavel DOT stehule AT gmail.com>
>> >> >  * Atri Sharma <atri AT apache DOT org>
>> >> >  * Nishith Singhal <nishsinghal AT gmail DOT com>
>> >> >  * Michael Down <michael AT dowuk DOT com>
>> >> >  * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com>
>> >> >  * Wang Albert <albertwang87 AT gmail DOT com>
>> >> >  * Hans-Jurgen Schonig <postgres AT cybertec DOT at>
>> >> >  * Kris Popat <krispopat AT apache DOT org>
>> >> >  * Ayrton Gomesz <com DOT ayrton AT gmail DOT com>
>> >> >
>> >> > == Alignment ==
>> >> > Concerted will be helpful to systems like Tajo which can benefit with
>> in
>> >> > memory structures optimized for heavy reads and joins (dimension
>> tables).
>> >> > In addition Concerted will benefit projects looking for in memory
>> >> > relational database as a metadata store, which is the case for most
of
>> >> the
>> >> > Apache Big Data projects. We expect Apache HAWQ (incubating), Apache
>> >> Hive,
>> >> > Apache Storm, Apache Tajo to be utilizing Concerted as a supporting
>> >> engine.
>> >> > For eg, a data warehouse built on HAWQ, Hive or Tajo can utilize
>> >> Concerted
>> >> > as an in memory engine for querying and joining dimensional tables.
>> >> >
>> >> > = Known Risks =
>> >> >
>> >> > == Orphaned Products ==
>> >> > Most of the code is developed by a small group of core developers and
>> >> this
>> >> > may be a risk for orphaned product. However, the code base is simple
>> as
>> >> > compared to other open source projects and the interest level in
>> >> Concerted
>> >> > has risen exponentially over the years with many computer
>> professionals
>> >> > expressing interest in the project and doing some use cases of the
>> >> > same.Specifically, there were some projects done around Concerted in
>> >> JIIT,
>> >> > Noida (an engineering school) and Wang is a student in Lehigh
>> University
>> >> > who has been following Concerted's progress over many years. The core
>> >> > developers are aligned with this project and since the code base is
>> >> simple,
>> >> > future committers will have a quick ramp up and the risk shall be
>> >> > mitigated. Besides, Ingenium Data Systems is launching a product
>> based on
>> >> > Concerted and will be having all its salaried developers contribute
to
>> >> > Concerted as a part of their job functions.
>> >> >
>> >> > == Inexperience with Open Source ==
>> >> > Most of the initial committers have experience working on open source
>> >> > projects. In particular, Atri is an active member of many open source
>> >> > projects.
>> >> >
>> >> > == Homogeneous Developers ==
>> >> > Although initial core developers were based out of India, community
>> now
>> >> > consists of computer professionals from various parts of the world
>> hence
>> >> > diversity should not be an issue. In addition, we will be documenting
>> >> > internals of the project in public facing documents and it shall allow
>> >> more
>> >> > contributors to join in.
>> >> >
>> >> > == Reliance on Salaried Developers ==
>> >> > It is expected that Concerted development will occur on both salaried
>> >> time
>> >> > and on volunteer time. Nupur and Amrish belong to Ingenium and are
>> >> > committed to building this project along with their team. Atri, as
the
>> >> > originator of this project, will be actively working on the project
>> and
>> >> is
>> >> > now pushing Concerted into major data warehousing projects, since he
>> is
>> >> > involved in architecture of data platforms. Developers are expected
>> to be
>> >> > contributing in their volunteer time. In addition, we will be working
>> >> with
>> >> > various open source projects which will be benefited by Concerted and
>> >> will
>> >> > be involving those communities into Concerted's development as well.
>> For
>> >> > eg, Apache Tajo has shown interest and will be supporting development
>> of
>> >> > the project.
>> >> >
>> >> > == Relationships with Other Apache Products ==
>> >> > Concerted has some overlapping function with Apache Geode(Incubating).
>> >> > However, Geode is an in memory key value store whereas Concerted is
a
>> >> write
>> >> > less read many engine. Concerted will complement Geode and increase
>> the
>> >> use
>> >> > cases Geode can support with Concerted's help.
>> >> >
>> >> > A major objective for Concerted is supporting OLAP workloads and data
>> >> > warehouses with in memory performance and highly performant reads and
>> >> > joins. Concerted will be collaborating with many open source projects
>> >> such
>> >> > as Apache HAWQ (incubating), Apache Hive, Apache Tajo etc to support
>> >> their
>> >> > OLAP workloads hence enabling them to support larger set of usecases
>> >> with a
>> >> > better throughput. For eg, a star schema in Hive will benefit from
>> having
>> >> > dimension tables in Concerted with highly efficient and scalable reads
>> >> and
>> >> > joins will be very fast. Similar workload for Tajo.
>> >> >
>> >> > Concerted will fit in many other use cases in Apache spectrum as well.
>> >> For
>> >> > eg, Concerted can be used with Apache Geode for in memory aggregation
>> >> > indexing. Concerted can also be used with Apache Flink for streaming
>> real
>> >> > time data into in memory, perform in memory aggregation and then
>> >> performing
>> >> > batch processing for efficiency.
>> >> >
>> >> >
>> >> > == A Excessive Fascination with the Apache Brand ==
>> >> > We believe that the "Apache Way" governance model will provide
>> additional
>> >> > help to us in finding contributors and growing the community. The
>> >> community
>> >> > and development process will make this project more stable and help
>> >> > establish ubiquitous APIs. In addition, Concerted is looking to
>> support
>> >> > multiple Apache projects in their use cases and accelerate their
>> >> > performance while soliciting their support in development of the
>> project.
>> >> > We will not be using Apache brand for excessive branding or with any
>> >> > commercial aspects of Concerted. Apache brand will primarily be used
>> for
>> >> > community building.
>> >> >
>> >> > = Documentation =
>> >> > Public documents are currently in development and will be published
>> soon.
>> >> >
>> >> > = Initial Source =
>> >> > The initial source is written in C++ and is heavily in development.
It
>> >> will
>> >> > be restructured and released publicly.
>> >> > We understand that there might be concerns around github source being
>> >> > developed by only a single person and development not happening after
>> >> 2013.
>> >> > The source on github is only the source initially developed as an
>> >> > independent project hence the limitation. However, due to reason that
>> >> > project has been present on github for a while now, it has attracted
>> >> > attention and people have been using and developing it locally. For
>> eg,
>> >> > Ingenium Data System took an interest in the project and locally
>> >> developed
>> >> > it and used it in an upcoming product they are going to release soon.
>> The
>> >> > project now wants to accumulate all independent development efforts
>> and
>> >> > help attract people to grow the community and project. We are
>> currently
>> >> in
>> >> > process of updating github repository and making branches for all
>> local
>> >> > development efforts.
>> >> >
>> >> > = Source and Intellectual Property Submission Plan =
>> >> >
>> >> > We intend the entire code base to be licensed under the Apache
>> License,
>> >> > Version 2.0.
>> >> >
>> >> > = External Dependencies =
>> >> > Currently, Concerted only depends on g++ compiler and pthreads.
>> pthreads
>> >> > will be replaced by Boost in next release.
>> >> >
>> >> > = Cryptography =
>> >> >
>> >> > N/A
>> >> >
>> >> > = Required Resources =
>> >> > == Mailling List ==
>> >> >  *private@concerted.incubator.apache.org (moderated subscriptions)
>> >> >  *commits@concerted.incubator.apache.org
>> >> >  *dev@concerted.incubator.apache.org
>> >> >  *issues@concerted.incubator.apache.org
>> >> >
>> >> > == Git Repository ==
>> >> >
>> >> > https://git-wip-us.apache.org/repos/asf/incubator-concerted.git
>> >> >
>> >> > == Issue Tracking ==
>> >> > Jira Concerted (CONCERTED)
>> >> >
>> >> > == Other Resources ==
>> >> >  * Continuous Integration
>> >> >   * Jenkins
>> >> >  * Wiki
>> >> >   * cwiki.apache.org/confluence/display/CONCERTED
>> >> >
>> >> > = Initial Committers =
>> >> >  * Roman Shaposhnik <rvs AT apache DOT org>
>> >> >  * Daniel Dai <daijy AT apache DOT org>
>> >> >  * Jake Farrell <jfarrell AT apache DOT org>
>> >> >  * Lars Hofhansl <larsh AT apache DOT org>
>> >> >  * Julian Hyde <jhyde AT apache DOT org>
>> >> >  * Chris Nauroth <cnauroth AT hortonworks DOT com>
>> >> >  * Pavel Stehule <pavel DOT stehule AT gmail.com>
>> >> >  * Amrish <amrishs AT ingeniumsys DOT com>
>> >> >  * Nupur S <nupurs AT ingeniumsys DOT com>
>> >> >  * Atri Sharma <atri AT apache DOT org>
>> >> >  * Nishith Singhal <nishsinghal AT gmail DOT com>
>> >> >  * Michael Down <michael AT dowuk DOT com>
>> >> >  * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com>
>> >> >  * Wang Albert <albertwang87 AT gmail DOT com>
>> >> >  * Hans-Jurgen Schonig <postgres AT cybertec DOT at>
>> >> >  * Kris Popat <krispopat AT apache DOT org>
>> >> >  * Ayrton Gomesz <com DOT ayrton AT gmail DOT com>
>> >> >
>> >> > = Affiliations =
>> >> >  * Roman Shaposhnik (Pivotal)
>> >> >  * Daniel Dai (HortonWorks)
>> >> >  * Jake Farrell (Acquia)
>> >> >  * Lars Hofhansl (Salesforce)
>> >> >  * Julian Hyde (HortonWorks)
>> >> >  * Chris Nauroth (HortonWorks)
>> >> >  * Pavel Stehule (GoodData)
>> >> >  * Amrish (Ingenium Data Systems)
>> >> >  * Nupur S (Ingenium Data Systems)
>> >> >  * Atri Sharma (Barclays)
>> >> >  * Nishith Singhal (Wipro)
>> >> >  * Michael Down (Barclays)
>> >> >  * Vijayakumar Ramdoss (EMC)
>> >> >  * Wang Albert (Lehigh University)
>> >> >  * Hans- Jurgen Schonig (CyberTec)
>> >> >  * Kris Popat (CETIS LLP)
>> >> >  * Ayrton Gomesz (IQLabs)
>> >> >
>> >> > The nominated mentors are employees of HortonWorks, Acquia, and
>> >> Salesforce.
>> >> >
>> >> >  * Daniel Dai (HortonWorks)
>> >> >  * Jake Farrell (Acquia)
>> >> >  * Lars Hofhansl (Salesforce)
>> >> >  * Julian Hyde (HortonWorks)
>> >> >  * Chris Nauroth (HortonWorks)
>> >> >
>> >> > = Sponsors =
>> >> >
>> >> > == Champion ==
>> >> >
>> >> >  * Roman Shaposhnik (rvs AT apache DOT org)
>> >> >
>> >> > == Nominated Mentors ==
>> >> >
>> >> >  * Daniel Dai <daijy AT apache DOT org>
>> >> >  * Jake Farrell <jfarrell AT apache DOT org>
>> >> >  * Lars Hofhansl <larsh AT apache DOT org>
>> >> >  * Julian Hyde <jhyde AT apache DOT org>
>> >> >  * Chris Nauroth <cnauroth AT hortonworks DOT com>
>> >> >
>> >> > == Sponsoring Entity ==
>> >> > Apache Incubator
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >> For additional commands, e-mail: general-help@incubator.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message