distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <sij...@twitter.com.INVALID>
Subject Re: [Discuss] Transaction Support
Date Wed, 19 Oct 2016 01:45:00 GMT
Sound good to me. look forward to the detailed proposal.

(I don't mind the format if it makes things easier to you)

Sijie

On Friday, October 14, 2016, Xi Liu <xi.liu.ant@gmail.com> wrote:

> Thank you, Sijie
>
> We have some internal discussions to sort out some details. We are ready to
> collaborate with the community for adding the transaction support in DL.
> We'd like to share more.
>
> I created a proposal wiki here -
> https://cwiki.apache.org/confluence/display/DL/DP-1+-+
> DistributedLog+Transaction+Support
>
> (I followed KIP format and named it as DP (DistributedLog Proposal - DP is
> also short for Dynamic Programming). I don't know if you guys like this
> name or not. Feel free to change it :D)
>
> I basically put my initial email as the content there so far. Once we
> finished our final discussion, I will update with more details. At the same
> time, any comments are welcome.
>
> - Xi
>
>
>
> On Sat, Oct 8, 2016 at 6:58 AM, Sijie Guo <sijie@apache.org <javascript:;>>
> wrote:
>
> > Xi,
> >
> > I just granted you the edit permission.
> >
> > - Sijie
> >
> > On Fri, Oct 7, 2016 at 10:34 AM, Xi Liu <xi.liu.ant@gmail.com
> <javascript:;>> wrote:
> >
> > > I still can not edit the wiki. Can any of the pmc members grant me the
> > > permissions?
> > >
> > > - Xi
> > >
> > > On Sat, Sep 17, 2016 at 10:35 PM, Xi Liu <xi.liu.ant@gmail.com
> <javascript:;>> wrote:
> > >
> > > > Sijie,
> > > >
> > > > I attempted to create a wiki page under that space. I found that I am
> > not
> > > > authorized with edit permission.
> > > >
> > > > Can any of the committers grant me the wiki edit permission? My
> account
> > > is
> > > > "xi.liu.ant".
> > > >
> > > > - Xi
> > > >
> > > >
> > > > On Tue, Sep 13, 2016 at 9:26 AM, Sijie Guo <sijie@apache.org
> <javascript:;>> wrote:
> > > >
> > > >> This sounds interesting ... I will take a closer look and give my
> > > comments
> > > >> later.
> > > >>
> > > >> At the same time, do you mind creating a wiki page to put your idea
> > > there?
> > > >> You can add your wiki page under
> > > >> https://cwiki.apache.org/confluence/display/DL/Project+Proposals
> > > >>
> > > >> You might need to ask in the dev list to grant the wiki edit
> > permissions
> > > >> to
> > > >> you once you have a wiki account.
> > > >>
> > > >> - Sijie
> > > >>
> > > >>
> > > >> On Mon, Sep 12, 2016 at 2:20 AM, Xi Liu <xi.liu.ant@gmail.com
> <javascript:;>> wrote:
> > > >>
> > > >> > Hello,
> > > >> >
> > > >> > I asked the transaction support in distributedlog user group
two
> > > months
> > > >> > ago. I want to raise this up again, as we are looking for using
> > > >> > distributedlog for building a transactional data service. It
is a
> > > major
> > > >> > feature that is missing in distributedlog. We have some ideas
to
> add
> > > >> this
> > > >> > to distributedlog and want to know if they make sense or not.
If
> > they
> > > >> are
> > > >> > good, we'd like to contribute and develop with the community.
> > > >> >
> > > >> > Here are the thoughts:
> > > >> >
> > > >> > -------------------------------------------------
> > > >> >
> > > >> > From our understanding, DL can provide "at-least-once" delivery
> > > semantic
> > > >> > (if not, please correct me) but not "exactly-once" delivery
> > semantic.
> > > >> That
> > > >> > means that a message can be delivered one or more times if the
> > reader
> > > >> > doesn't handle duplicates.
> > > >> >
> > > >> > The duplicates come from two places, one is at writer side (this
> > > assumes
> > > >> > using write proxy not the core library), while the other one
is at
> > > >> reader
> > > >> > side.
> > > >> >
> > > >> > - writer side: if the client attempts to write a record to the
> write
> > > >> > proxies and gets a network error (e.g timeouts) then retries,
the
> > > >> retrying
> > > >> > will potentially result in duplicates.
> > > >> > - reader side:if the reader reads a message from a stream and
then
> > > >> crashes,
> > > >> > when the reader restarts it would restart from last known position
> > > >> (DLSN).
> > > >> > If the reader fails after processing a record and before recording
> > the
> > > >> > position, the processed record will be delivered again.
> > > >> >
> > > >> > The reader problem can be properly addressed by making use of
the
> > > >> sequence
> > > >> > numbers of records and doing proper checkpointing. For example,
in
> > > >> > database, it can checkpoint the indexed data with the sequence
> > number
> > > of
> > > >> > records; in flink, it can checkpoint the state with the sequence
> > > >> numbers.
> > > >> >
> > > >> > The writer problem can be addressed by implementing an idempotent
> > > >> writer.
> > > >> > However, an alternative and more powerful approach is to support
> > > >> > transactions.
> > > >> >
> > > >> > *What does transaction mean?*
> > > >> >
> > > >> > A transaction means a collection of records can be written
> > > >> transactionally
> > > >> > within a stream or across multiple streams. They will be consumed
> by
> > > the
> > > >> > reader together when a transaction is committed, or will never
be
> > > >> consumed
> > > >> > by the reader when the transaction is aborted.
> > > >> >
> > > >> > The transaction will expose following guarantees:
> > > >> >
> > > >> > - The reader should not be exposed to records written from
> > uncommitted
> > > >> > transactions (mandatory)
> > > >> > - The reader should consume the records in the transaction commit
> > > order
> > > >> > rather than the record written order (mandatory)
> > > >> > - No duplicated records within a transaction (mandatory)
> > > >> > - Allow interleaving transactional writes and non-transactional
> > writes
> > > >> > (optional)
> > > >> >
> > > >> > *Stream Transaction & Namespace Transaction*
> > > >> >
> > > >> > There will be two types of transaction, one is Stream level
> > > transaction
> > > >> > (local transaction), while the other one is Namespace level
> > > transaction
> > > >> > (global transaction).
> > > >> >
> > > >> > The stream level transaction is a transactional operation on
> writing
> > > >> > records to one stream; the namespace level transaction is a
> > > >> transactional
> > > >> > operation on writing records to multiple streams.
> > > >> >
> > > >> > *Implementation Thoughts*
> > > >> >
> > > >> > - A transaction is consist of begin control record, a series
of
> data
> > > >> > records and commit/abort control record.
> > > >> > - The begin/commit/abort control record is written to a `commit`
> log
> > > >> > stream, while the data records will be written to normal data
log
> > > >> streams.
> > > >> > - The `commit` log stream will be the same log stream for
> > stream-level
> > > >> > transaction,  while it will be a *system* stream (or multiple
> system
> > > >> > streams) for namespace-level transactions.
> > > >> > - The transaction code looks like as below:
> > > >> >
> > > >> > <code>
> > > >> >
> > > >> > Transaction txn = client.transaction();
> > > >> > Future<DLSN> result1 = txn.write(stream-0, record);
> > > >> > Future<DLSN> result2 = txn.write(stream-1, record);
> > > >> > Future<DLSN> result3 = txn.write(stream-2, record);
> > > >> > Future<Pair<DLSN, DLSN>> result = txn.commit();
> > > >> >
> > > >> > </code>
> > > >> >
> > > >> > if the txn is committed, all the write futures will be satisfied
> > with
> > > >> their
> > > >> > written DLSNs. if the txn is aborted, all the write futures will
> be
> > > >> failed
> > > >> > together. there is no partial failure state.
> > > >> >
> > > >> > - The actually data flow will be:
> > > >> >
> > > >> > 1. writer get a transaction id from the owner of the `commit'
log
> > > stream
> > > >> > 1. write the begin control record (synchronously) with the
> > transaction
> > > >> id
> > > >> > 2. for each write within the same txn, it will be assigned a
local
> > > >> sequence
> > > >> > number starting from 0. the combination of transaction id and
> local
> > > >> > sequence number will be used later on by the readers to
> de-duplicate
> > > >> > records.
> > > >> > 3. the commit/abort control record will be written based on the
> > > results
> > > >> > from 2.
> > > >> >
> > > >> > - Application can supply a timeout for the transaction when
> > #begin() a
> > > >> > transaction. The owner of the `commit` log stream can abort
> > > transactions
> > > >> > that never be committed/aborted within their timeout.
> > > >> >
> > > >> > - Failures:
> > > >> >
> > > >> > * all the log records can be simply retried as they will be
> > > >> de-duplicated
> > > >> > probably at the reader side.
> > > >> >
> > > >> > - Reader:
> > > >> >
> > > >> > * Reader can be configured to read uncommitted records or
> committed
> > > >> records
> > > >> > only (by default read uncommitted records)
> > > >> > * If reader is configured to read committed records only, the
read
> > > ahead
> > > >> > cache will be changed to maintain one additional pending committed
> > > >> records.
> > > >> > the pending committed records map is bounded and records will
be
> > > dropped
> > > >> > when read ahead is moving.
> > > >> > * when the reader hits a commit record, it will rewind to the
> begin
> > > >> record
> > > >> > and start reading from there. leveraging the proper read ahead
> cache
> > > and
> > > >> > pending commit records cache, it would be good for both short
> > > >> transactions
> > > >> > and long transactions.
> > > >> >
> > > >> > - DLSN, SequenceId:
> > > >> >
> > > >> > * We will add a fourth field to DLSN. It is `local sequence
> number`
> > > >> within
> > > >> > a transaction session. So the new DLSN of records in a transaction
> > > will
> > > >> be
> > > >> > the DLSN of commit control record plus its local sequence number.
> > > >> > * The sequence id will be still the position of the commit record
> > plus
> > > >> its
> > > >> > local sequence number. The position will be advanced with total
> > number
> > > >> of
> > > >> > written records on writing the commit control record.
> > > >> >
> > > >> > - Transaction Group & Namespace Transaction
> > > >> >
> > > >> > using one single log stream for namespace transaction can cause
> the
> > > >> > bottleneck problem since all the begin/commit/end control records
> > will
> > > >> have
> > > >> > to go through one log stream.
> > > >> >
> > > >> > the idea of 'transaction group' is to allow partitioning the
> writers
> > > >> into
> > > >> > different transaction groups.
> > > >> >
> > > >> > clients can specify the `group-name` when starting the
> transaction.
> > if
> > > >> > there is no `group-name` specified, it will use the default
> `commit`
> > > >> log in
> > > >> > the namespace for creating transactions.
> > > >> >
> > > >> > -------------------------------------------------
> > > >> >
> > > >> > I'd like to collect feedbacks on this idea. Appreciate any
> comments
> > > and
> > > >> if
> > > >> > anyone is also interested in this idea, we'd like to collaborate
> > with
> > > >> the
> > > >> > community.
> > > >> >
> > > >> >
> > > >> > - Xi
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message