distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sijie Guo <si...@apache.org>
Subject Re: [Discuss] Transaction Support
Date Sat, 07 Jan 2017 00:50:24 GMT
On Thu, Jan 5, 2017 at 10:56 PM, Xi Liu <xi.liu.ant@gmail.com> wrote:

> Asko and Sijie,
>
> Thank you so much for your feedbacks.
>
> We are not targeting at building a general XA transaction coordinator. The
> feature we want is be able to write data to multiple log streams in an
> atomic way.
>

So in other words, that means this feature is 'atomic writes across log
streams', no?


>
> I totally agreed with you about building minimal logic. We also don't want
> to enforce this feature to all the users of DL. Building the TC as a
> separated service sounds clear to me. We will do it follow your suggestion.
>
> I am also replying the comments to you and Leigh on the doc. Hopefully we
> can come to an agreement so that our changes can be accepted.
>
> - Xi
>
> On Wed, Jan 4, 2017 at 1:14 AM, Asko Kauppi <asko.kauppi@zalando.fi>
> wrote:
>
> > > Beside that, I have one general question - What is the major goal for
> > this
> > > feature? Are you targeting on building a general XA transaction
> > coordinator
> > > or just for supporting things like `copy-modify-write' style workflow?
> >
> > The use case I would have for transactions - at some level of the stack -
> > is supporting dynamic configurations.
> >
> > If a config changes in e.g. three lines, some of the changes may
> logically
> > belong together. E.g. changing both “host” and “port” (if separate
> > entries). One shouldn’t be able to read a state, even temporarily, that
> has
> > new host but old port.
> >
> > I can do this in the application level - it does not need to be part of
> > the DL protocol.
> >
> >
> > Asko Kauppi
> > Zalando Tech Helsinki
> >
> > > On 4 Jan 2017, at 9.18, Sijie Guo <sijie@apache.org> wrote:
> > >
> > > Sorry for late response. I think Leigh and you already had some very
> > > valuable discussions in the doc. I will try to add some of my questions
> > to
> > > the discussion.
> > >
> > > Beside that, I had a discussion with Leigh today about this. first of
> > all,
> > > I think it is very good to add transaction support in distributedlog.
> It
> > is
> > > one of the primitives that would help building distributed service. But
> > we
> > > have a concern about making this system become complicated and
> introduce
> > > operational overhead when it runs in the large scale system on
> > production.
> > > There are two major suggestions that I have for this feature -
> > >
> > > Build the 'minimum' logic in core - I think the minimum logic that need
> > to
> > > be added to the core is -  the special control records (begin, commit
> and
> > > abort) and make the reader be able to detect those special control
> > records
> > > and know what do they mean and how to interrupt with them. Since they
> are
> > > special control records, there is not overhead to other readers that
> > > doesn't require this feature.
> > >
> > > Build the transaction coordinator as a separated proxy service  - I
> think
> > > the major concern that we have is putting more complexities into the
> > 'write
> > > proxy' service. We architected distributedlog in a more
> microservice-like
> > > way - we have the core as the stream store, the proxy for serving write
> > and
> > > read traffic. It would be good that the transaction feature can be done
> > in
> > > a similar way. So the architecture would be like this -
> > >
> > > *[ write service ] [ read service ] [ transaction coordinator ]*
> > > *[ stream store
> > >            ]*
> > >
> > > if people doesn't need the transaction feature, they can turn if off
> > > completely without any operational overhead.
> > >
> > > Beside that, I have one general question - What is the major goal for
> > this
> > > feature? Are you targeting on building a general XA transaction
> > coordinator
> > > or just for supporting things like `copy-modify-write' style workflow?
> > >
> > >
> > > Thanks,
> > > Sijie
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Dec 28, 2016 at 1:12 PM, Xi Liu <xi.liu.ant@gmail.com> wrote:
> > >
> > >> Ping?
> > >>
> > >> On Mon, Dec 19, 2016 at 8:28 AM, Xi Liu <xi.liu.ant@gmail.com> wrote:
> > >>
> > >>> Sijie,
> > >>>
> > >>> No. I thought it might be easier for people to comment on a google
> doc
> > to
> > >>> gather the initial feedback. I will put the content back to wiki page
> > >> once
> > >>> addressing the comments. Does that sound good to you?
> > >>>
> > >>> And thank you in advance.
> > >>>
> > >>> - Xi
> > >>>
> > >>>
> > >>>
> > >>> On Sun, Dec 18, 2016 at 8:48 AM, Sijie Guo <sijie@apache.org>
wrote:
> > >>>
> > >>>> Hi Xi,
> > >>>>
> > >>>> sorry for late response. I will review it soon.
> > >>>>
> > >>>> regarding this, a separate question "are we going to use google
doc
> > >>>> instead
> > >>>> of email thread for any discussion"? I am a bit worried that the
> > >>>> discussion
> > >>>> will become lost after moving to google doc. No idea on how other
> > apache
> > >>>> projects are doing.
> > >>>>
> > >>>> - Sijie
> > >>>>
> > >>>> On Wed, Dec 14, 2016 at 11:41 PM, Xi Liu <xi.liu.ant@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I finalized the first version of the design. This time I used
a
> > google
> > >>>> doc
> > >>>>> so that it is easier for commenting and add a link the wiki
page. I
> > >> will
> > >>>>> update this to the wiki page once we come to the finalized
design.
> > >>>>>
> > >>>>> https://docs.google.com/document/d/14Ns05M8Z5a6DF6fHmWQwISyD5jjeK
> > >>>>> bSIGgSzXuTI5BA/edit
> > >>>>>
> > >>>>> Let me know if you have any questions. Appreciate your reviews!
> > >>>>>
> > >>>>> - Xi
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Oct 28, 2016 at 7:58 AM, Leigh Stewart
> > >>>>> <lstewart@twitter.com.invalid
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> Interesting proposal. A couple quick notes while you continue
to
> > >> flesh
> > >>>>> this
> > >>>>>> out.
> > >>>>>>
> > >>>>>> a. just to be sure - does this eliminate the need to save
seqno
> with
> > >>>>>> checkpoint?
> > >>>>>>
> > >>>>>> b. i.e. another way to describe this kind of improvement
is
> "support
> > >>>>>> records (atomic writes) larger than 1MB", iiuc. the advantage
> being
> > >> it
> > >>>>>> avoids the baggage of transactions. disadvantages include
> inability
> > >>>> to do
> > >>>>>> cross stream transactions, and flexibility (interleaving,
etc)
> (are
> > >>>> there
> > >>>>>> others?).
> > >>>>>>
> > >>>>>> c. proxy use case is for supporting multiple writers -
have you
> > >>>> thought
> > >>>>>> about how this would work with multiple writers?
> > >>>>>>
> > >>>>>> Thanks!
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Oct 18, 2016 at 6:45 PM, Sijie Guo
> > >> <sijieg@twitter.com.invalid
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Sound good to me. look forward to the detailed proposal.
> > >>>>>>>
> > >>>>>>> (I don't mind the format if it makes things easier
to you)
> > >>>>>>>
> > >>>>>>> Sijie
> > >>>>>>>
> > >>>>>>> On Friday, October 14, 2016, Xi Liu <xi.liu.ant@gmail.com>
> wrote:
> > >>>>>>>
> > >>>>>>>> Thank you, Sijie
> > >>>>>>>>
> > >>>>>>>> We have some internal discussions to sort out some
details. We
> > >> are
> > >>>>>> ready
> > >>>>>>> to
> > >>>>>>>> collaborate with the community for adding the transaction
> > >> support
> > >>>> in
> > >>>>>> DL.
> > >>>>>>>> We'd like to share more.
> > >>>>>>>>
> > >>>>>>>> I created a proposal wiki here -
> > >>>>>>>> https://cwiki.apache.org/confluence/display/DL/DP-1+-+
> > >>>>>>>> DistributedLog+Transaction+Support
> > >>>>>>>>
> > >>>>>>>> (I followed KIP format and named it as DP (DistributedLog
> > >>>> Proposal -
> > >>>>> DP
> > >>>>>>> is
> > >>>>>>>> also short for Dynamic Programming). I don't know
if you guys
> > >> like
> > >>>>> this
> > >>>>>>>> name or not. Feel free to change it :D)
> > >>>>>>>>
> > >>>>>>>> I basically put my initial email as the content
there so far.
> > >>>> Once we
> > >>>>>>>> finished our final discussion, I will update with
more details.
> > >> At
> > >>>>> the
> > >>>>>>> same
> > >>>>>>>> time, any comments are welcome.
> > >>>>>>>>
> > >>>>>>>> - Xi
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Sat, Oct 8, 2016 at 6:58 AM, Sijie Guo <sijie@apache.org
> > >>>>>>> <javascript:;>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Xi,
> > >>>>>>>>>
> > >>>>>>>>> I just granted you the edit permission.
> > >>>>>>>>>
> > >>>>>>>>> - Sijie
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Oct 7, 2016 at 10:34 AM, Xi Liu <xi.liu.ant@gmail.com
> > >>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> I still can not edit the wiki. Can any
of the pmc members
> > >>>> grant
> > >>>>> me
> > >>>>>>> the
> > >>>>>>>>>> permissions?
> > >>>>>>>>>>
> > >>>>>>>>>> - Xi
> > >>>>>>>>>>
> > >>>>>>>>>> On Sat, Sep 17, 2016 at 10:35 PM, Xi Liu
<
> > >>>> xi.liu.ant@gmail.com
> > >>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Sijie,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I attempted to create a wiki page under
that space. I
> > >> found
> > >>>>> that
> > >>>>>> I
> > >>>>>>> am
> > >>>>>>>>> not
> > >>>>>>>>>>> authorized with edit permission.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Can any of the committers grant me
the wiki edit
> > >>>> permission? My
> > >>>>>>>> account
> > >>>>>>>>>> is
> > >>>>>>>>>>> "xi.liu.ant".
> > >>>>>>>>>>>
> > >>>>>>>>>>> - Xi
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Tue, Sep 13, 2016 at 9:26 AM, Sijie
Guo <
> > >>>> sijie@apache.org
> > >>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> This sounds interesting ... I will
take a closer look and
> > >>>> give
> > >>>>>> my
> > >>>>>>>>>> comments
> > >>>>>>>>>>>> later.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> At the same time, do you mind creating
a wiki page to put
> > >>>> your
> > >>>>>>> idea
> > >>>>>>>>>> there?
> > >>>>>>>>>>>> You can add your wiki page under
> > >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/DL/Project+
> > >>>>>> Proposals
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> You might need to ask in the dev
list to grant the wiki
> > >>>> edit
> > >>>>>>>>> permissions
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>> you once you have a wiki account.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> - Sijie
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Mon, Sep 12, 2016 at 2:20 AM,
Xi Liu <
> > >>>> xi.liu.ant@gmail.com
> > >>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hello,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I asked the transaction support
in distributedlog user
> > >>>> group
> > >>>>>> two
> > >>>>>>>>>> months
> > >>>>>>>>>>>>> ago. I want to raise this up
again, as we are looking
> > >> for
> > >>>>>> using
> > >>>>>>>>>>>>> distributedlog for building
a transactional data
> > >>>> service. It
> > >>>>>> is
> > >>>>>>> a
> > >>>>>>>>>> major
> > >>>>>>>>>>>>> feature that is missing in
distributedlog. We have some
> > >>>>> ideas
> > >>>>>> to
> > >>>>>>>> add
> > >>>>>>>>>>>> this
> > >>>>>>>>>>>>> to distributedlog and want
to know if they make sense
> > >> or
> > >>>>> not.
> > >>>>>> If
> > >>>>>>>>> they
> > >>>>>>>>>>>> are
> > >>>>>>>>>>>>> good, we'd like to contribute
and develop with the
> > >>>>> community.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Here are the thoughts:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -------------------------------------------------
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> From our understanding, DL
can provide "at-least-once"
> > >>>>>> delivery
> > >>>>>>>>>> semantic
> > >>>>>>>>>>>>> (if not, please correct me)
but not "exactly-once"
> > >>>> delivery
> > >>>>>>>>> semantic.
> > >>>>>>>>>>>> That
> > >>>>>>>>>>>>> means that a message can be
delivered one or more times
> > >>>> if
> > >>>>> the
> > >>>>>>>>> reader
> > >>>>>>>>>>>>> doesn't handle duplicates.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The duplicates come from two
places, one is at writer
> > >>>> side
> > >>>>>> (this
> > >>>>>>>>>> assumes
> > >>>>>>>>>>>>> using write proxy not the core
library), while the
> > >> other
> > >>>> one
> > >>>>>> is
> > >>>>>>> at
> > >>>>>>>>>>>> reader
> > >>>>>>>>>>>>> side.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - writer side: if the client
attempts to write a record
> > >>>> to
> > >>>>> the
> > >>>>>>>> write
> > >>>>>>>>>>>>> proxies and gets a network
error (e.g timeouts) then
> > >>>>> retries,
> > >>>>>>> the
> > >>>>>>>>>>>> retrying
> > >>>>>>>>>>>>> will potentially result in
duplicates.
> > >>>>>>>>>>>>> - reader side:if the reader
reads a message from a
> > >> stream
> > >>>>> and
> > >>>>>>> then
> > >>>>>>>>>>>> crashes,
> > >>>>>>>>>>>>> when the reader restarts it
would restart from last
> > >> known
> > >>>>>>> position
> > >>>>>>>>>>>> (DLSN).
> > >>>>>>>>>>>>> If the reader fails after processing
a record and
> > >> before
> > >>>>>>> recording
> > >>>>>>>>> the
> > >>>>>>>>>>>>> position, the processed record
will be delivered again.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The reader problem can be properly
addressed by making
> > >>>> use
> > >>>>> of
> > >>>>>>> the
> > >>>>>>>>>>>> sequence
> > >>>>>>>>>>>>> numbers of records and doing
proper checkpointing. For
> > >>>>>> example,
> > >>>>>>> in
> > >>>>>>>>>>>>> database, it can checkpoint
the indexed data with the
> > >>>>> sequence
> > >>>>>>>>> number
> > >>>>>>>>>> of
> > >>>>>>>>>>>>> records; in flink, it can checkpoint
the state with the
> > >>>>>> sequence
> > >>>>>>>>>>>> numbers.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The writer problem can be addressed
by implementing an
> > >>>>>>> idempotent
> > >>>>>>>>>>>> writer.
> > >>>>>>>>>>>>> However, an alternative and
more powerful approach is
> > >> to
> > >>>>>> support
> > >>>>>>>>>>>>> transactions.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> *What does transaction mean?*
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> A transaction means a collection
of records can be
> > >>>> written
> > >>>>>>>>>>>> transactionally
> > >>>>>>>>>>>>> within a stream or across multiple
streams. They will
> > >> be
> > >>>>>>> consumed
> > >>>>>>>> by
> > >>>>>>>>>> the
> > >>>>>>>>>>>>> reader together when a transaction
is committed, or
> > >> will
> > >>>>> never
> > >>>>>>> be
> > >>>>>>>>>>>> consumed
> > >>>>>>>>>>>>> by the reader when the transaction
is aborted.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The transaction will expose
following guarantees:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - The reader should not be
exposed to records written
> > >>>> from
> > >>>>>>>>> uncommitted
> > >>>>>>>>>>>>> transactions (mandatory)
> > >>>>>>>>>>>>> - The reader should consume
the records in the
> > >>>> transaction
> > >>>>>>> commit
> > >>>>>>>>>> order
> > >>>>>>>>>>>>> rather than the record written
order (mandatory)
> > >>>>>>>>>>>>> - No duplicated records within
a transaction
> > >> (mandatory)
> > >>>>>>>>>>>>> - Allow interleaving transactional
writes and
> > >>>>>> non-transactional
> > >>>>>>>>> writes
> > >>>>>>>>>>>>> (optional)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> *Stream Transaction & Namespace
Transaction*
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> There will be two types of
transaction, one is Stream
> > >>>> level
> > >>>>>>>>>> transaction
> > >>>>>>>>>>>>> (local transaction), while
the other one is Namespace
> > >>>> level
> > >>>>>>>>>> transaction
> > >>>>>>>>>>>>> (global transaction).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The stream level transaction
is a transactional
> > >>>> operation on
> > >>>>>>>> writing
> > >>>>>>>>>>>>> records to one stream; the
namespace level transaction
> > >>>> is a
> > >>>>>>>>>>>> transactional
> > >>>>>>>>>>>>> operation on writing records
to multiple streams.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> *Implementation Thoughts*
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - A transaction is consist
of begin control record, a
> > >>>> series
> > >>>>>> of
> > >>>>>>>> data
> > >>>>>>>>>>>>> records and commit/abort control
record.
> > >>>>>>>>>>>>> - The begin/commit/abort control
record is written to a
> > >>>>>> `commit`
> > >>>>>>>> log
> > >>>>>>>>>>>>> stream, while the data records
will be written to
> > >> normal
> > >>>>> data
> > >>>>>>> log
> > >>>>>>>>>>>> streams.
> > >>>>>>>>>>>>> - The `commit` log stream will
be the same log stream
> > >> for
> > >>>>>>>>> stream-level
> > >>>>>>>>>>>>> transaction,  while it will
be a *system* stream (or
> > >>>>> multiple
> > >>>>>>>> system
> > >>>>>>>>>>>>> streams) for namespace-level
transactions.
> > >>>>>>>>>>>>> - The transaction code looks
like as below:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> <code>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Transaction txn = client.transaction();
> > >>>>>>>>>>>>> Future<DLSN> result1
= txn.write(stream-0, record);
> > >>>>>>>>>>>>> Future<DLSN> result2
= txn.write(stream-1, record);
> > >>>>>>>>>>>>> Future<DLSN> result3
= txn.write(stream-2, record);
> > >>>>>>>>>>>>> Future<Pair<DLSN, DLSN>>
result = txn.commit();
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> </code>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> if the txn is committed, all
the write futures will be
> > >>>>>> satisfied
> > >>>>>>>>> with
> > >>>>>>>>>>>> their
> > >>>>>>>>>>>>> written DLSNs. if the txn is
aborted, all the write
> > >>>> futures
> > >>>>>> will
> > >>>>>>>> be
> > >>>>>>>>>>>> failed
> > >>>>>>>>>>>>> together. there is no partial
failure state.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - The actually data flow will
be:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 1. writer get a transaction
id from the owner of the
> > >>>>> `commit'
> > >>>>>>> log
> > >>>>>>>>>> stream
> > >>>>>>>>>>>>> 1. write the begin control
record (synchronously) with
> > >>>> the
> > >>>>>>>>> transaction
> > >>>>>>>>>>>> id
> > >>>>>>>>>>>>> 2. for each write within the
same txn, it will be
> > >>>> assigned a
> > >>>>>>> local
> > >>>>>>>>>>>> sequence
> > >>>>>>>>>>>>> number starting from 0. the
combination of transaction
> > >> id
> > >>>>> and
> > >>>>>>>> local
> > >>>>>>>>>>>>> sequence number will be used
later on by the readers to
> > >>>>>>>> de-duplicate
> > >>>>>>>>>>>>> records.
> > >>>>>>>>>>>>> 3. the commit/abort control
record will be written
> > >> based
> > >>>> on
> > >>>>>> the
> > >>>>>>>>>> results
> > >>>>>>>>>>>>> from 2.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Application can supply a
timeout for the transaction
> > >>>> when
> > >>>>>>>>> #begin() a
> > >>>>>>>>>>>>> transaction. The owner of the
`commit` log stream can
> > >>>> abort
> > >>>>>>>>>> transactions
> > >>>>>>>>>>>>> that never be committed/aborted
within their timeout.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Failures:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> * all the log records can be
simply retried as they
> > >> will
> > >>>> be
> > >>>>>>>>>>>> de-duplicated
> > >>>>>>>>>>>>> probably at the reader side.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Reader:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> * Reader can be configured
to read uncommitted records
> > >> or
> > >>>>>>>> committed
> > >>>>>>>>>>>> records
> > >>>>>>>>>>>>> only (by default read uncommitted
records)
> > >>>>>>>>>>>>> * If reader is configured to
read committed records
> > >> only,
> > >>>>> the
> > >>>>>>> read
> > >>>>>>>>>> ahead
> > >>>>>>>>>>>>> cache will be changed to maintain
one additional
> > >> pending
> > >>>>>>> committed
> > >>>>>>>>>>>> records.
> > >>>>>>>>>>>>> the pending committed records
map is bounded and
> > >> records
> > >>>>> will
> > >>>>>> be
> > >>>>>>>>>> dropped
> > >>>>>>>>>>>>> when read ahead is moving.
> > >>>>>>>>>>>>> * when the reader hits a commit
record, it will rewind
> > >> to
> > >>>>> the
> > >>>>>>>> begin
> > >>>>>>>>>>>> record
> > >>>>>>>>>>>>> and start reading from there.
leveraging the proper
> > >> read
> > >>>>> ahead
> > >>>>>>>> cache
> > >>>>>>>>>> and
> > >>>>>>>>>>>>> pending commit records cache,
it would be good for both
> > >>>>> short
> > >>>>>>>>>>>> transactions
> > >>>>>>>>>>>>> and long transactions.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - DLSN, SequenceId:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> * We will add a fourth field
to DLSN. It is `local
> > >>>> sequence
> > >>>>>>>> number`
> > >>>>>>>>>>>> within
> > >>>>>>>>>>>>> a transaction session. So the
new DLSN of records in a
> > >>>>>>> transaction
> > >>>>>>>>>> will
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>> the DLSN of commit control
record plus its local
> > >> sequence
> > >>>>>>> number.
> > >>>>>>>>>>>>> * The sequence id will be still
the position of the
> > >>>> commit
> > >>>>>>> record
> > >>>>>>>>> plus
> > >>>>>>>>>>>> its
> > >>>>>>>>>>>>> local sequence number. The
position will be advanced
> > >> with
> > >>>>>> total
> > >>>>>>>>> number
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>> written records on writing
the commit control record.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Transaction Group & Namespace
Transaction
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> using one single log stream
for namespace transaction
> > >> can
> > >>>>>> cause
> > >>>>>>>> the
> > >>>>>>>>>>>>> bottleneck problem since all
the begin/commit/end
> > >> control
> > >>>>>>> records
> > >>>>>>>>> will
> > >>>>>>>>>>>> have
> > >>>>>>>>>>>>> to go through one log stream.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> the idea of 'transaction group'
is to allow
> > >> partitioning
> > >>>> the
> > >>>>>>>> writers
> > >>>>>>>>>>>> into
> > >>>>>>>>>>>>> different transaction groups.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> clients can specify the `group-name`
when starting the
> > >>>>>>>> transaction.
> > >>>>>>>>> if
> > >>>>>>>>>>>>> there is no `group-name` specified,
it will use the
> > >>>> default
> > >>>>>>>> `commit`
> > >>>>>>>>>>>> log in
> > >>>>>>>>>>>>> the namespace for creating
transactions.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -------------------------------------------------
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I'd like to collect feedbacks
on this idea. Appreciate
> > >>>> any
> > >>>>>>>> comments
> > >>>>>>>>>> and
> > >>>>>>>>>>>> if
> > >>>>>>>>>>>>> anyone is also interested in
this idea, we'd like to
> > >>>>>> collaborate
> > >>>>>>>>> with
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>> community.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Xi
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message