incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <si...@apache.org>
Subject [RESULT] [VOTE] Accept DistributedLog into the Apache Incubator
Date Fri, 24 Jun 2016 20:56:51 GMT
The results are in and voting is now closed. The votes were ...

[15] +1 Accept DistributedLog into the Apache Incubator

Sijie Guo
Jia Zhai
Naresh Agarwal
Debo Dutta
Tsuyoshi (ozawa@)
Flavio Junqueira (binding)
Chris Douglas (binding)
Henry Saputra (binding)
Josh Elser (binding)
Mahak Patidar
Dave Rusek
Stevo Slavic
Chris Nauroth (binding)
Suneel Marthi (binding)
Jakob Homan (binding)

[0] +0 Abstain.
[0] -1 Do not accept DistributedLog into the Apache Incubator because ...

DistributedLog has been accepted into the Incubator!

Thanks everyone who took time to look at the project and vote!

The vote thread can be found there:

http://mail-archives.apache.org/mod_mbox/incubator-general/201606.mbox/%3CCAO2yDyamYeMZ892GdwjxGn_J-WKgUcOLqfudA4YyU4nqZVmaKA%40mail.gmail.com%3E


---------- Forwarded message ----------
From: Sijie Guo <sijie@apache.org>
Date: Mon, Jun 20, 2016 at 10:11 PM
Subject: [VOTE] Accept DistributedLog into the Apache Incubator
To: general@incubator.apache.org


Hello All,

Following the discussion thread, I would like to call a VOTE on accepting
DistributedLog into the Apache Incubator.

[] +1 Accept DistributedLog into the Apache Incubator
[] +0 Abstain.
[] -1 Do not accept DistributedLog into the Apache Incubator because ...

This vote will be open for at least 72 hours.

The proposal follows, you can also access the wiki page:
https://wiki.apache.org/incubator/DistributedLogProposal

Here is my +1.

Thanks,
Sijie

= Abstract =
DistributedLog is a high-performance replicated log service. It offers
durability, replication and strong consistency, which provides a
fundamental building block for building reliable distributed systems, e.g
replicated-state-machines, general pub/sub systems, distributed databases,
distributed queues and etc.

See “Building Distributedlog - Twitter’s high performance replicated log
service” for details:
https://blog.twitter.com/2015/building-distributedlog-twitter-s-high-performance-replicated-log-service

= Proposal =
We propose to contribute DistributedLog codebase and associated artifacts
(e.g. documentation, web-site content etc.) to the Apache Software
Foundation with the intent of forming a productive, meritocratic and open
community around DistributedLog’s continued development, according to the
‘Apache Way’.

= Background =
Engineers at Twitter began developing DistributedLog in early 2013.
DistributedLog is described in a Twitter engineering blog post and
presented at the Messaging Meetup in Sep 2015. It has been released as an
Apache-licensed open-source project on GitHub in May 2016.

DistributedLog is a high-performance replicated log service, which provides
simple stream-oriented abstractions over log-segments and offers
durability, replication and strong consistency for building reliable
distributed systems. The features offered by DistributedLog includes:

 * Simple high-level, stream oriented interface
 * Naming and metadata scheme for managing streams and other entities
 * Log data management policies, include data segmentation and data
retention
 * Fast write pipeline leveraging batching and compression
 * Fast read mechanism leveraging long-poll and read-ahead caching
 * Service tiers supporting writer fan-in and reader fan-out
 * Geo-replicated logs

DistributedLog’s most important benefit is high-performance with a strong
durability guarantee, making it extremely appropriate for running different
workloads from distributed database journaling to real-time stream
computing. Its modern, layered architecture makes it easy to run the
service tiers in multi-tenant datacenter environments such as Apache Mesos
or cloud environments such as EC2.

= Rationale =
DistributedLog is designed to provide core fundamental features like
high-performance, durability and strong consistency to anyone who is
building reliable distributed systems, in a simple and efficient way.

We believe that the ASF is the right venue to foster an open-source
community around DistributedLog’s development. We expect that
DistributedLog will benefit from collaboration with related Apache
projects, and under the auspices of the ASF will attract talented
contributors who will push DistributedLog’s development forward at a faster
pace.

We believe that the timing is right for DistributedLog’s development to
move to the ASF: DistributedLog has already run in production at Twitter
for 3 years and served various workloads including a distributed database
journal, reliable cross datacenter replication, search ingestion,
andgeneral pub/sub messaging. The project is stable. We are excited to see
where an ASF-based community can take DistributedLog.

= Current Status =
DistributedLog is a stable project that has been used in production at
Twitter for 3 years. The source code is public at github.com/twitter, which
will seed the Apache git repository.

= Meritocracy =
We understand the central importance of meritocracy to the Apache Way. We
will work to establish a welcoming, fair and meritocratic community.
Several companies have already expressed interest in this project, and we
intend to invite additional developers to participate. We look forward to
growing a rich user and developer community.

= Community =
There is a large need for a performant replicated log service for
applications such as distributed databases, distributed transactional
systems, replicated-state-machines and pub/sub messaging/queuing. We want
to attract more developers to the project, and we believe that the ASF’s
open and meritocratic philosophy will help us with this. We note the
success of other similar projects already part of the ASF, like Kafka.

= Core Developers =
DistributedLog is actively developed within Twitter. Most of the developers
are from Twitter. Many of them are committers or PMC members of Apache
BookKeeper. Others aren’t currently affiliated with ASF so they will
require new ICLAs.

= Alignment =
DistributedLog is related to several other Apache projects:

 * DistributedLog stores log segments as Ledgers in Apache BookKeeper.
 * DistributedLog uses Apache ZooKeeper for naming and metadata management
and tracking the ownership of logs.
 * DistributedLog uses Apache Thrift as its RPC and serialization framework.
 * In the long-term, DistributedLog’s data will be stored in Apache Hadoop
clusters powered by HDFS filesystem for archives and backup.

= Known Risks =
== Orphaned Products ==
DistributedLog is used as the fundamental messaging infrastructure at
Twitter. It has been serving production traffic for online database
systems, search ingestion and a general pub/sub system. Twitter remains
committed to developing and supporting the project. Twitter has a strong
track record in standing behind projects that were contributed to the ASF
by its employees, including Apache Mesos, Apache Aurora, Apache BookKeeper,
Apache Hadoop. There are many companies are interested in using it in
production.

== Inexperience with Open Source ==
The core developers of DistributedLog are committers of Apache BookKeeper.
Although other committers on the initial list are committers or have less
experience with the ASF, they already are active in Apache BookKeeper
community. We are confident that the project can be run in accordance with
Apache principles on an ongoing basis.

== Homogeneous Developers ==
The initial committers are from Twitter. We hope to encourage contributions
from other developers and grow them into committers after they have had
time to continue their contributions.

== Reliance on Salaried Developers ==
Many of DistributedLog’s initial set of committers work full-time on
DistributedLog, and are paid to do so. However, as mentioned elsewhere, we
anticipate growth in the developer community which we hope will include
people from industry, hobbyists, and academics who have an interested in
distributed messaging systems.

== Relationships with Other Apache Products ==
DistributedLog uses Apache BookKeeper to store log segments and Apache
ZooKeeper to store log metadata and manage log namespaces. It provides an
end-to-end solution for replicated logs, to make building reliable
distributed systems much easier. Unlike Kafka or ActiveMQ, DistributedLog
is not a full-fledged pub/sub, queuing or messaging system.  Instead, it is
targeting on providing a fundamental building block for other distributed
systems, offering durability, replication and consistency. So it could be
used by other distributed systems, such as transactional log for replicated
state machines (e.g., HDFS NameNode), WAL for distributed databases (e.g.
HBase), Journal for in-memory services (e.g., Kestrel) and even storage
backend for a full-fledged messaging system.

== An Excessive Fascination with the Apache Brand ==
DistributedLog builds on two existing top-level projects, Apache BookKeeper
and Apache ZooKeeper. Some of the core developers actively participate in
both projects and understand well the implications of being hosted by
Apache. We would like this project to build on the same core values of ASF
and to grow a community based on meritocracy. Also, there are several other
projects already hosted by ASF in this space of reliable messaging and that
overlap with DistributedLog in interests and scope. Consequently, the
combination of all these observations makes us believe that DistributedLog
should be hosted by the ASF.

= Documentation =
Building DistributedLog: Twitter’s high performance replicated log service (
https://blog.twitter.com/2015/building-distributedlog-twitter-s-high-performance-replicated-log-service
)

Documentation located in http://distributedlog.io.

= Initial Source =
DistributedLog’s initial source contribution will come from
http://github.com/twitter/distributedlog/.

= External Dependencies =
DistributedLog depends upon a number of third-party libraries, which we
list below.

 * Apache BookKeeper (Apache Software License v2.0)
 * Apache Commons (Apache Software License v2.0)
 * Apache Maven (Apache Software License v2.0)
 * Apache Thrift (Apache Software License v2.0)
 * Apache ZooKeeper (Apache Software License v2.0)
 * Google Guava (Apache Software License v2.0)
 * Mockito (MIT License)
 * Junit (Eclipse Public License 1.0)
 * LZ4-java (Apache Software License v2.0)
 * SLF4J (MIT License)
 * Twitter Finagle (Apache Software License v2.0)
 * Twitter Scrooge (Apache Software License v2.0)
 * Twitter Util (Apache Software License v2.0)

= Required Resources =
We request that following resources be created for the project to use:

== Mailing lists ==
 * private@distributedlog.incubator.apache.org (moderated subscriptions)
 * commits@distributedlog.incubator.apache.org
 * dev@distributedlog.incubator.apache.org
 * user@distributedlog.incubator.apache.org

== Git repository ==
https://git.apache.org/distributedlog.git

== JIRA instance ==
JIRA project DLOG (DLOG or DL)

= Initial Committers =
 * Sijie Guo (Apache BookKeeper Committer, Twitter)
 * Robin Dhamankar (Apache BookKeeper Committer)
 * Leigh Stewart (Twitter)
 * Dave Rusek (Twitter)
 * Honggang Zhang (Twitter)
 * Jordan Bull (Twitter)
 * Satish Kotha (Twitter)
 * Aniruddha Laud
 * Franck Cuny (Twitter)
 * Eitan Adler (Twitter)

== Affiliations ==
Most of the initial committers are employees of Twitter, except Robin
Dhamankar and Aniruddha Laud.

= Sponsors =
== Champion ==
Flavio Junqueira

== Nominated Mentors ==
 * Flavio Junqueira
 * Chris Nauroth
 * Henry Saputra

= Sponsoring Entity =
We ask that the Apache Incubator PMC to sponsor this proposal.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message