incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atri Sharma <atri.j...@gmail.com>
Subject Re: [VOTE] Accept Concerted into the Apache Incubator
Date Fri, 09 Oct 2015 21:09:15 GMT
Hi,

Please find answers below:

1) The main source code on Github wasn't updated for a while. However, the
original and main core was written in 2013 and has been open source since
then. As we discussed earlier current code base is only starting point for
complete development and will be first integrated with silo work done
independent and then used as starting implementation.

2) The JNI native API when optimized can provide great performance ( I have
written an application using it and it is on production systems for many
years). I think we can still provide a high performance API to the C++ core
and that is something I am personally working on right now.
On 10 Oct 2015 02:31, "Julian Hyde" <jhyde@apache.org> wrote:

> I have agreed to be a mentor to Concerted and I think it is an
> interesting idea. I am inclined to vote for it entering the incubator.
>
> However since the project has not released any source code yet, there
> are a couple of questions I'd like to get answered for the record:
>
> 1. How many lines of existing code are there? What is their approximate
> age?
>
> 2. Concerted is in C/C++ but you mention interfacing with JVM-based
> products like Hive. How you would interface with other languages? Is
> it a goal of the project to create APIs to other languages such as
> Java? Would access from those languages be as efficient as native
> access?
>
> I apologize that I didn't bring these up in the discussion thread.
>
> Julian
>
>
> On Fri, Oct 9, 2015 at 11:53 AM, Ayrton Gomesz <com.ayrton@gmail.com>
> wrote:
> > +1
> > @henry.saputra thanks man
> > On Oct 9, 2015 5:50 PM, "Henry Saputra" <henry.saputra@gmail.com> wrote:
> >
> >> +1 (binding)
> >> Good luck guys!
> >>
> >> On Fri, Oct 9, 2015 at 8:55 AM, Atri Sharma <atri@apache.org> wrote:
> >> > Hi all,
> >> >
> >> > Following the discussion about Concerted I would like to call a vote
> for
> >> > accepting Concerted as a new incubator project.
> >> >
> >> > The proposal text is included below, and available on the wiki:
> >> >
> >> > https://wiki.apache.org/incubator/ConcertedProposal
> >> >
> >> > The vote is open for 72 hours:
> >> >
> >> > [ ] +1 accept Concerted in the Incubator
> >> > [ ] ±0
> >> > [ ] -1 (please give reason)
> >> >
> >> > Regards,
> >> >
> >> > Atri
> >> >
> >> > = Abstract =
> >> >
> >> > Concerted is an in memory write less read more engine aimed to provide
> >> > extreme read performance with very high degree of concurrency and
> >> > scalability and focus on minimizing own resource footprint.
> >> >
> >> > = Proposal =
> >> > Concerted is built on the principal that a new type of workload is
> >> > dominating the scene and is now needed to be supported. These are the
> >> large
> >> > data set analytical workloads being analyzed or used on large
> clusters or
> >> > high power machines. Large analytical workloads depend on the ability
> to
> >> > query large data sets efficiently and in high concurrency while
> >> maintaining
> >> > semantics such as immediate consistency. An in memory engine designed
> to
> >> > support extreme read queries while providing support for aggregation
> >> > through various features (such as multidimensional representation of
> >> > tuples) will accelerate many usecases around large scale analytics.
> >> >
> >> > Concerted believes that best understanding of user application lies
> with
> >> > user application developer. The need for massive read scaling should
> be
> >> on
> >> > demand and should be flexible to the level that user can decide as to
> >> which
> >> > representation and access of data suits his/her current requirements.
> >> > Hence, Concerted is not built in a traditional client/server model.
> >> > Concerted provides users with an API which can be used to load, read,
> >> > update and delete data. User chooses which data structure has to be
> used
> >> > for his current requirements. All API access is covered by Concerted's
> >> > internal systems like lock manager, transaction manager and cache
> manager
> >> > which ensure that reads scale to high level in every API call.
> >> >
> >> > Concerted is a Do It Yourself in memory platform for making in memory
> >> > supporting engines. The use case we think of is supporting big data
> >> > warehouses like Hive, but there are endless use cases for a custom,
> >> highly
> >> > scalable in memory platform.
> >> >
> >> > The goal of this proposal is to leverage an existing code base
> available
> >> on
> >> > Github and licensed under the Apache License 2.0 to build a community
> >> > around the project. Currently the community consists of existing
> hackers
> >> of
> >> > Concerted as well as people who have been following and associated
> with
> >> the
> >> > project since a while as well as database experts who are excited
> about
> >> > building a project like this. We are hoping that entering into Apache
> >> would
> >> > help us attract more contributors as well as connect with existing big
> >> data
> >> > projects like Apache Hive, Apache HAWQ, Apache Storm, Apache Tajo,
> Apache
> >> > Spark, Apache Geode to leverage their community base while assisting
> in
> >> > their use cases with Concerted. We had a discussion with founders of
> >> Apache
> >> > Tajo and they showed interest in using Concerted for some of their use
> >> > cases.
> >> > = Background =
> >> > Relational databases were built with the cost of physical memory in
> mind.
> >> > The cost is no longer very relevant and physical memory is now
> available
> >> on
> >> > demand. Another driving factor behind Concerted is that there is a
> >> paradigm
> >> > shift with big data coming into picture. Disk IO speeds are more of a
> >> > bottleneck than ever before. Combining the read dominance of
> analytical
> >> > workload with the speed of in memory structures, Concerted fits the
> >> current
> >> > scene. Also, supporting OLAP workloads with in memory support for
> faster
> >> > read constant queries and joins will be useful.
> >> >
> >> > = Rationale =
> >> > As explained above, large analytical workloads need an in memory
> >> > lightweight engine which supports massive read concurrency, ground
> level
> >> > support for aggregations and analytics, extreme scalability and high
> read
> >> > performance, along with the engine being very light itself. Concerted
> >> aims
> >> > to solve these needs. Concerted is designed and built with three
> goals as
> >> > objectives:
> >> >
> >> >
> >> > Performance
> >> >     To provide high performance access to data from a large number of
> >> rows,
> >> > Concerted uses efficient representation and in memory indexing of data
> >> > coupled with high performance transactions, custom transactions and
> >> > lightweight locking and lockless techniques and an intelligent locking
> >> > manager.
> >> >
> >> > Scalability
> >> >     Concerted is built with extreme concurrency and scalability in
> mind.
> >> >
> >> > Efficiency
> >> >     Concerted aims to give expected performance under vast variety of
> >> > workloads and aims to have as low footprint as possible.
> >> >
> >> > = Initial Goals =
> >> > The initial goal is to leverage an existing code base and invest in
> >> > building a community around the project. We anticipate a lot of
> initial
> >> > restructuring of the existing code so that it becomes easier to
> include
> >> new
> >> > contributors and minimize ramp up time. We plan to approach this
> >> > refactoring in a fully transparent, community-driven way thus
> starting to
> >> > practice the "Apache Way" governance model from the get go.
> >> >
> >> > Various contributors are getting individual changes into branches in
> >> github
> >> > repository and our initial major goal will be to merge in all those
> >> changes
> >> > in master repository.
> >> >
> >> > = Current Status =
> >> > Concerted is currently under restructuring to suit the needs of an
> open
> >> > source project. Current source is available at
> >> > https://github.com/atris/Concerted (Please note that updated
> codebase is
> >> > not yet present on github) Concerted is currently being licensed under
> >> > Apache License 2.0. Most of the code base is implemented in C and C++
> and
> >> > has external dependencies listed later.
> >> >
> >> > == Meritocracy ==
> >> >
> >> > We plan to drive the technical roadmap and implementation in a fully
> >> > transparent, community-driven way soliciting feedback from all of the
> >> > community members and building a consensus-driven approach to evolving
> >> the
> >> > code base and the community itself. Users and new contributors will be
> >> > treated with respect and welcomed. By participating in the community
> and
> >> > providing quality patches/support that move the project forward,
> >> > contributors will earn merit. They also will be encouraged to provide
> >> > non-code contributions (documentation, events, community management,
> >> etc.)
> >> > and will gain merit for doing so. Those with a proven support and
> quality
> >> > track record will be encouraged to become committers.
> >> >
> >> > == Community ==
> >> > In memory is the new cutting edge thing and a new community around
> >> > performance oriented systems and enhancing relational database
> >> performance
> >> > by having complete in memory OLTP engines will greatly benefit
> >> performance.
> >> > So we expect data warehousing projects and communities as well as
> >> projects
> >> > and companies looking for high performance OLTP performance. In
> addition,
> >> > Ingenium Data Systems is building products around Concerted and will
> have
> >> > salaried developers contribute to the project as part of job
> >> responsibility.
> >> >
> >> > == Core Developers ==
> >> > Core developers are a diverse group of developers, many of which are
> very
> >> > experienced in open source and the Apache Hadoop ecosystem.
> Specifically,
> >> > Atri is an Apache Apex committer and Atri and Pavel are major
> >> contributors
> >> > to PostgreSQL project.Atri is also committer for other open source
> >> projects.
> >> >
> >> >  * Amrish <amrishs AT ingeniumsys DOT com>
> >> >  * Nupur S <nupurs AT ingeniumsys DOT com>
> >> >  * Pavel Stehule <pavel DOT stehule AT gmail.com>
> >> >  * Atri Sharma <atri AT apache DOT org>
> >> >  * Nishith Singhal <nishsinghal AT gmail DOT com>
> >> >  * Michael Down <michael AT dowuk DOT com>
> >> >  * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com>
> >> >  * Wang Albert <albertwang87 AT gmail DOT com>
> >> >  * Hans-Jurgen Schonig <postgres AT cybertec DOT at>
> >> >  * Kris Popat <krispopat AT apache DOT org>
> >> >  * Ayrton Gomesz <com DOT ayrton AT gmail DOT com>
> >> >
> >> > == Alignment ==
> >> > Concerted will be helpful to systems like Tajo which can benefit with
> in
> >> > memory structures optimized for heavy reads and joins (dimension
> tables).
> >> > In addition Concerted will benefit projects looking for in memory
> >> > relational database as a metadata store, which is the case for most of
> >> the
> >> > Apache Big Data projects. We expect Apache HAWQ (incubating), Apache
> >> Hive,
> >> > Apache Storm, Apache Tajo to be utilizing Concerted as a supporting
> >> engine.
> >> > For eg, a data warehouse built on HAWQ, Hive or Tajo can utilize
> >> Concerted
> >> > as an in memory engine for querying and joining dimensional tables.
> >> >
> >> > = Known Risks =
> >> >
> >> > == Orphaned Products ==
> >> > Most of the code is developed by a small group of core developers and
> >> this
> >> > may be a risk for orphaned product. However, the code base is simple
> as
> >> > compared to other open source projects and the interest level in
> >> Concerted
> >> > has risen exponentially over the years with many computer
> professionals
> >> > expressing interest in the project and doing some use cases of the
> >> > same.Specifically, there were some projects done around Concerted in
> >> JIIT,
> >> > Noida (an engineering school) and Wang is a student in Lehigh
> University
> >> > who has been following Concerted's progress over many years. The core
> >> > developers are aligned with this project and since the code base is
> >> simple,
> >> > future committers will have a quick ramp up and the risk shall be
> >> > mitigated. Besides, Ingenium Data Systems is launching a product
> based on
> >> > Concerted and will be having all its salaried developers contribute to
> >> > Concerted as a part of their job functions.
> >> >
> >> > == Inexperience with Open Source ==
> >> > Most of the initial committers have experience working on open source
> >> > projects. In particular, Atri is an active member of many open source
> >> > projects.
> >> >
> >> > == Homogeneous Developers ==
> >> > Although initial core developers were based out of India, community
> now
> >> > consists of computer professionals from various parts of the world
> hence
> >> > diversity should not be an issue. In addition, we will be documenting
> >> > internals of the project in public facing documents and it shall allow
> >> more
> >> > contributors to join in.
> >> >
> >> > == Reliance on Salaried Developers ==
> >> > It is expected that Concerted development will occur on both salaried
> >> time
> >> > and on volunteer time. Nupur and Amrish belong to Ingenium and are
> >> > committed to building this project along with their team. Atri, as the
> >> > originator of this project, will be actively working on the project
> and
> >> is
> >> > now pushing Concerted into major data warehousing projects, since he
> is
> >> > involved in architecture of data platforms. Developers are expected
> to be
> >> > contributing in their volunteer time. In addition, we will be working
> >> with
> >> > various open source projects which will be benefited by Concerted and
> >> will
> >> > be involving those communities into Concerted's development as well.
> For
> >> > eg, Apache Tajo has shown interest and will be supporting development
> of
> >> > the project.
> >> >
> >> > == Relationships with Other Apache Products ==
> >> > Concerted has some overlapping function with Apache Geode(Incubating).
> >> > However, Geode is an in memory key value store whereas Concerted is a
> >> write
> >> > less read many engine. Concerted will complement Geode and increase
> the
> >> use
> >> > cases Geode can support with Concerted's help.
> >> >
> >> > A major objective for Concerted is supporting OLAP workloads and data
> >> > warehouses with in memory performance and highly performant reads and
> >> > joins. Concerted will be collaborating with many open source projects
> >> such
> >> > as Apache HAWQ (incubating), Apache Hive, Apache Tajo etc to support
> >> their
> >> > OLAP workloads hence enabling them to support larger set of usecases
> >> with a
> >> > better throughput. For eg, a star schema in Hive will benefit from
> having
> >> > dimension tables in Concerted with highly efficient and scalable reads
> >> and
> >> > joins will be very fast. Similar workload for Tajo.
> >> >
> >> > Concerted will fit in many other use cases in Apache spectrum as well.
> >> For
> >> > eg, Concerted can be used with Apache Geode for in memory aggregation
> >> > indexing. Concerted can also be used with Apache Flink for streaming
> real
> >> > time data into in memory, perform in memory aggregation and then
> >> performing
> >> > batch processing for efficiency.
> >> >
> >> >
> >> > == A Excessive Fascination with the Apache Brand ==
> >> > We believe that the "Apache Way" governance model will provide
> additional
> >> > help to us in finding contributors and growing the community. The
> >> community
> >> > and development process will make this project more stable and help
> >> > establish ubiquitous APIs. In addition, Concerted is looking to
> support
> >> > multiple Apache projects in their use cases and accelerate their
> >> > performance while soliciting their support in development of the
> project.
> >> > We will not be using Apache brand for excessive branding or with any
> >> > commercial aspects of Concerted. Apache brand will primarily be used
> for
> >> > community building.
> >> >
> >> > = Documentation =
> >> > Public documents are currently in development and will be published
> soon.
> >> >
> >> > = Initial Source =
> >> > The initial source is written in C++ and is heavily in development. It
> >> will
> >> > be restructured and released publicly.
> >> > We understand that there might be concerns around github source being
> >> > developed by only a single person and development not happening after
> >> 2013.
> >> > The source on github is only the source initially developed as an
> >> > independent project hence the limitation. However, due to reason that
> >> > project has been present on github for a while now, it has attracted
> >> > attention and people have been using and developing it locally. For
> eg,
> >> > Ingenium Data System took an interest in the project and locally
> >> developed
> >> > it and used it in an upcoming product they are going to release soon.
> The
> >> > project now wants to accumulate all independent development efforts
> and
> >> > help attract people to grow the community and project. We are
> currently
> >> in
> >> > process of updating github repository and making branches for all
> local
> >> > development efforts.
> >> >
> >> > = Source and Intellectual Property Submission Plan =
> >> >
> >> > We intend the entire code base to be licensed under the Apache
> License,
> >> > Version 2.0.
> >> >
> >> > = External Dependencies =
> >> > Currently, Concerted only depends on g++ compiler and pthreads.
> pthreads
> >> > will be replaced by Boost in next release.
> >> >
> >> > = Cryptography =
> >> >
> >> > N/A
> >> >
> >> > = Required Resources =
> >> > == Mailling List ==
> >> >  *private@concerted.incubator.apache.org (moderated subscriptions)
> >> >  *commits@concerted.incubator.apache.org
> >> >  *dev@concerted.incubator.apache.org
> >> >  *issues@concerted.incubator.apache.org
> >> >
> >> > == Git Repository ==
> >> >
> >> > https://git-wip-us.apache.org/repos/asf/incubator-concerted.git
> >> >
> >> > == Issue Tracking ==
> >> > Jira Concerted (CONCERTED)
> >> >
> >> > == Other Resources ==
> >> >  * Continuous Integration
> >> >   * Jenkins
> >> >  * Wiki
> >> >   * cwiki.apache.org/confluence/display/CONCERTED
> >> >
> >> > = Initial Committers =
> >> >  * Roman Shaposhnik <rvs AT apache DOT org>
> >> >  * Daniel Dai <daijy AT apache DOT org>
> >> >  * Jake Farrell <jfarrell AT apache DOT org>
> >> >  * Lars Hofhansl <larsh AT apache DOT org>
> >> >  * Julian Hyde <jhyde AT apache DOT org>
> >> >  * Chris Nauroth <cnauroth AT hortonworks DOT com>
> >> >  * Pavel Stehule <pavel DOT stehule AT gmail.com>
> >> >  * Amrish <amrishs AT ingeniumsys DOT com>
> >> >  * Nupur S <nupurs AT ingeniumsys DOT com>
> >> >  * Atri Sharma <atri AT apache DOT org>
> >> >  * Nishith Singhal <nishsinghal AT gmail DOT com>
> >> >  * Michael Down <michael AT dowuk DOT com>
> >> >  * Vijayakumar Ramdoss <vijayakumar DOT ramdoss AT emc DOT com>
> >> >  * Wang Albert <albertwang87 AT gmail DOT com>
> >> >  * Hans-Jurgen Schonig <postgres AT cybertec DOT at>
> >> >  * Kris Popat <krispopat AT apache DOT org>
> >> >  * Ayrton Gomesz <com DOT ayrton AT gmail DOT com>
> >> >
> >> > = Affiliations =
> >> >  * Roman Shaposhnik (Pivotal)
> >> >  * Daniel Dai (HortonWorks)
> >> >  * Jake Farrell (Acquia)
> >> >  * Lars Hofhansl (Salesforce)
> >> >  * Julian Hyde (HortonWorks)
> >> >  * Chris Nauroth (HortonWorks)
> >> >  * Pavel Stehule (GoodData)
> >> >  * Amrish (Ingenium Data Systems)
> >> >  * Nupur S (Ingenium Data Systems)
> >> >  * Atri Sharma (Barclays)
> >> >  * Nishith Singhal (Wipro)
> >> >  * Michael Down (Barclays)
> >> >  * Vijayakumar Ramdoss (EMC)
> >> >  * Wang Albert (Lehigh University)
> >> >  * Hans- Jurgen Schonig (CyberTec)
> >> >  * Kris Popat (CETIS LLP)
> >> >  * Ayrton Gomesz (IQLabs)
> >> >
> >> > The nominated mentors are employees of HortonWorks, Acquia, and
> >> Salesforce.
> >> >
> >> >  * Daniel Dai (HortonWorks)
> >> >  * Jake Farrell (Acquia)
> >> >  * Lars Hofhansl (Salesforce)
> >> >  * Julian Hyde (HortonWorks)
> >> >  * Chris Nauroth (HortonWorks)
> >> >
> >> > = Sponsors =
> >> >
> >> > == Champion ==
> >> >
> >> >  * Roman Shaposhnik (rvs AT apache DOT org)
> >> >
> >> > == Nominated Mentors ==
> >> >
> >> >  * Daniel Dai <daijy AT apache DOT org>
> >> >  * Jake Farrell <jfarrell AT apache DOT org>
> >> >  * Lars Hofhansl <larsh AT apache DOT org>
> >> >  * Julian Hyde <jhyde AT apache DOT org>
> >> >  * Chris Nauroth <cnauroth AT hortonworks DOT com>
> >> >
> >> > == Sponsoring Entity ==
> >> > Apache Incubator
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message