distributedlog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asko Kauppi <asko.kau...@zalando.fi>
Subject Re: [Discuss] Transaction Support
Date Fri, 06 Jan 2017 11:48:43 GMT
Xi, I'm curious about:

>be able to write data to multiple log streams in an atomic way.

Do you intend to group such logs together, so that one producer would be
able to "own" them and write to any of them, with other writers waiting
until (s)he's done. I'd like to know more of the use case.


On 6 January 2017 at 08:56, Xi Liu <xi.liu.ant@gmail.com> wrote:

> Asko and Sijie,
>
> Thank you so much for your feedbacks.
>
> We are not targeting at building a general XA transaction coordinator. The
> feature we want is be able to write data to multiple log streams in an
> atomic way.
>
> I totally agreed with you about building minimal logic. We also don't want
> to enforce this feature to all the users of DL. Building the TC as a
> separated service sounds clear to me. We will do it follow your suggestion.
>
> I am also replying the comments to you and Leigh on the doc. Hopefully we
> can come to an agreement so that our changes can be accepted.
>
> - Xi
>
> On Wed, Jan 4, 2017 at 1:14 AM, Asko Kauppi <asko.kauppi@zalando.fi>
> wrote:
>
> > > Beside that, I have one general question - What is the major goal for
> > this
> > > feature? Are you targeting on building a general XA transaction
> > coordinator
> > > or just for supporting things like `copy-modify-write' style workflow?
> >
> > The use case I would have for transactions - at some level of the stack -
> > is supporting dynamic configurations.
> >
> > If a config changes in e.g. three lines, some of the changes may
> logically
> > belong together. E.g. changing both “host” and “port” (if separate
> > entries). One shouldn’t be able to read a state, even temporarily, that
> has
> > new host but old port.
> >
> > I can do this in the application level - it does not need to be part of
> > the DL protocol.
> >
> >
> > Asko Kauppi
> > Zalando Tech Helsinki
> >
> > > On 4 Jan 2017, at 9.18, Sijie Guo <sijie@apache.org> wrote:
> > >
> > > Sorry for late response. I think Leigh and you already had some very
> > > valuable discussions in the doc. I will try to add some of my questions
> > to
> > > the discussion.
> > >
> > > Beside that, I had a discussion with Leigh today about this. first of
> > all,
> > > I think it is very good to add transaction support in distributedlog.
> It
> > is
> > > one of the primitives that would help building distributed service. But
> > we
> > > have a concern about making this system become complicated and
> introduce
> > > operational overhead when it runs in the large scale system on
> > production.
> > > There are two major suggestions that I have for this feature -
> > >
> > > Build the 'minimum' logic in core - I think the minimum logic that need
> > to
> > > be added to the core is -  the special control records (begin, commit
> and
> > > abort) and make the reader be able to detect those special control
> > records
> > > and know what do they mean and how to interrupt with them. Since they
> are
> > > special control records, there is not overhead to other readers that
> > > doesn't require this feature.
> > >
> > > Build the transaction coordinator as a separated proxy service  - I
> think
> > > the major concern that we have is putting more complexities into the
> > 'write
> > > proxy' service. We architected distributedlog in a more
> microservice-like
> > > way - we have the core as the stream store, the proxy for serving write
> > and
> > > read traffic. It would be good that the transaction feature can be done
> > in
> > > a similar way. So the architecture would be like this -
> > >
> > > *[ write service ] [ read service ] [ transaction coordinator ]*
> > > *[ stream store
> > >            ]*
> > >
> > > if people doesn't need the transaction feature, they can turn if off
> > > completely without any operational overhead.
> > >
> > > Beside that, I have one general question - What is the major goal for
> > this
> > > feature? Are you targeting on building a general XA transaction
> > coordinator
> > > or just for supporting things like `copy-modify-write' style workflow?
> > >
> > >
> > > Thanks,
> > > Sijie
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Dec 28, 2016 at 1:12 PM, Xi Liu <xi.liu.ant@gmail.com> wrote:
> > >
> > >> Ping?
> > >>
> > >> On Mon, Dec 19, 2016 at 8:28 AM, Xi Liu <xi.liu.ant@gmail.com> wrote:
> > >>
> > >>> Sijie,
> > >>>
> > >>> No. I thought it might be easier for people to comment on a google
> doc
> > to
> > >>> gather the initial feedback. I will put the content back to wiki page
> > >> once
> > >>> addressing the comments. Does that sound good to you?
> > >>>
> > >>> And thank you in advance.
> > >>>
> > >>> - Xi
> > >>>
> > >>>
> > >>>
> > >>> On Sun, Dec 18, 2016 at 8:48 AM, Sijie Guo <sijie@apache.org>
wrote:
> > >>>
> > >>>> Hi Xi,
> > >>>>
> > >>>> sorry for late response. I will review it soon.
> > >>>>
> > >>>> regarding this, a separate question "are we going to use google
doc
> > >>>> instead
> > >>>> of email thread for any discussion"? I am a bit worried that the
> > >>>> discussion
> > >>>> will become lost after moving to google doc. No idea on how other
> > apache
> > >>>> projects are doing.
> > >>>>
> > >>>> - Sijie
> > >>>>
> > >>>> On Wed, Dec 14, 2016 at 11:41 PM, Xi Liu <xi.liu.ant@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I finalized the first version of the design. This time I used
a
> > google
> > >>>> doc
> > >>>>> so that it is easier for commenting and add a link the wiki
page. I
> > >> will
> > >>>>> update this to the wiki page once we come to the finalized
design.
> > >>>>>
> > >>>>> https://docs.google.com/document/d/14Ns05M8Z5a6DF6fHmWQwISyD5jjeK
> > >>>>> bSIGgSzXuTI5BA/edit
> > >>>>>
> > >>>>> Let me know if you have any questions. Appreciate your reviews!
> > >>>>>
> > >>>>> - Xi
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Oct 28, 2016 at 7:58 AM, Leigh Stewart
> > >>>>> <lstewart@twitter.com.invalid
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> Interesting proposal. A couple quick notes while you continue
to
> > >> flesh
> > >>>>> this
> > >>>>>> out.
> > >>>>>>
> > >>>>>> a. just to be sure - does this eliminate the need to save
seqno
> with
> > >>>>>> checkpoint?
> > >>>>>>
> > >>>>>> b. i.e. another way to describe this kind of improvement
is
> "support
> > >>>>>> records (atomic writes) larger than 1MB", iiuc. the advantage
> being
> > >> it
> > >>>>>> avoids the baggage of transactions. disadvantages include
> inability
> > >>>> to do
> > >>>>>> cross stream transactions, and flexibility (interleaving,
etc)
> (are
> > >>>> there
> > >>>>>> others?).
> > >>>>>>
> > >>>>>> c. proxy use case is for supporting multiple writers -
have you
> > >>>> thought
> > >>>>>> about how this would work with multiple writers?
> > >>>>>>
> > >>>>>> Thanks!
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Oct 18, 2016 at 6:45 PM, Sijie Guo
> > >> <sijieg@twitter.com.invalid
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Sound good to me. look forward to the detailed proposal.
> > >>>>>>>
> > >>>>>>> (I don't mind the format if it makes things easier
to you)
> > >>>>>>>
> > >>>>>>> Sijie
> > >>>>>>>
> > >>>>>>> On Friday, October 14, 2016, Xi Liu <xi.liu.ant@gmail.com>
> wrote:
> > >>>>>>>
> > >>>>>>>> Thank you, Sijie
> > >>>>>>>>
> > >>>>>>>> We have some internal discussions to sort out some
details. We
> > >> are
> > >>>>>> ready
> > >>>>>>> to
> > >>>>>>>> collaborate with the community for adding the transaction
> > >> support
> > >>>> in
> > >>>>>> DL.
> > >>>>>>>> We'd like to share more.
> > >>>>>>>>
> > >>>>>>>> I created a proposal wiki here -
> > >>>>>>>> https://cwiki.apache.org/confluence/display/DL/DP-1+-+
> > >>>>>>>> DistributedLog+Transaction+Support
> > >>>>>>>>
> > >>>>>>>> (I followed KIP format and named it as DP (DistributedLog
> > >>>> Proposal -
> > >>>>> DP
> > >>>>>>> is
> > >>>>>>>> also short for Dynamic Programming). I don't know
if you guys
> > >> like
> > >>>>> this
> > >>>>>>>> name or not. Feel free to change it :D)
> > >>>>>>>>
> > >>>>>>>> I basically put my initial email as the content
there so far.
> > >>>> Once we
> > >>>>>>>> finished our final discussion, I will update with
more details.
> > >> At
> > >>>>> the
> > >>>>>>> same
> > >>>>>>>> time, any comments are welcome.
> > >>>>>>>>
> > >>>>>>>> - Xi
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Sat, Oct 8, 2016 at 6:58 AM, Sijie Guo <sijie@apache.org
> > >>>>>>> <javascript:;>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Xi,
> > >>>>>>>>>
> > >>>>>>>>> I just granted you the edit permission.
> > >>>>>>>>>
> > >>>>>>>>> - Sijie
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Oct 7, 2016 at 10:34 AM, Xi Liu <xi.liu.ant@gmail.com
> > >>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> I still can not edit the wiki. Can any
of the pmc members
> > >>>> grant
> > >>>>> me
> > >>>>>>> the
> > >>>>>>>>>> permissions?
> > >>>>>>>>>>
> > >>>>>>>>>> - Xi
> > >>>>>>>>>>
> > >>>>>>>>>> On Sat, Sep 17, 2016 at 10:35 PM, Xi Liu
<
> > >>>> xi.liu.ant@gmail.com
> > >>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Sijie,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I attempted to create a wiki page under
that space. I
> > >> found
> > >>>>> that
> > >>>>>> I
> > >>>>>>> am
> > >>>>>>>>> not
> > >>>>>>>>>>> authorized with edit permission.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Can any of the committers grant me
the wiki edit
> > >>>> permission? My
> > >>>>>>>> account
> > >>>>>>>>>> is
> > >>>>>>>>>>> "xi.liu.ant".
> > >>>>>>>>>>>
> > >>>>>>>>>>> - Xi
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Tue, Sep 13, 2016 at 9:26 AM, Sijie
Guo <
> > >>>> sijie@apache.org
> > >>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> This sounds interesting ... I will
take a closer look and
> > >>>> give
> > >>>>>> my
> > >>>>>>>>>> comments
> > >>>>>>>>>>>> later.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> At the same time, do you mind creating
a wiki page to put
> > >>>> your
> > >>>>>>> idea
> > >>>>>>>>>> there?
> > >>>>>>>>>>>> You can add your wiki page under
> > >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/DL/Project+
> > >>>>>> Proposals
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> You might need to ask in the dev
list to grant the wiki
> > >>>> edit
> > >>>>>>>>> permissions
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>> you once you have a wiki account.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> - Sijie
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Mon, Sep 12, 2016 at 2:20 AM,
Xi Liu <
> > >>>> xi.liu.ant@gmail.com
> > >>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hello,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I asked the transaction support
in distributedlog user
> > >>>> group
> > >>>>>> two
> > >>>>>>>>>> months
> > >>>>>>>>>>>>> ago. I want to raise this up
again, as we are looking
> > >> for
> > >>>>>> using
> > >>>>>>>>>>>>> distributedlog for building
a transactional data
> > >>>> service. It
> > >>>>>> is
> > >>>>>>> a
> > >>>>>>>>>> major
> > >>>>>>>>>>>>> feature that is missing in
distributedlog. We have some
> > >>>>> ideas
> > >>>>>> to
> > >>>>>>>> add
> > >>>>>>>>>>>> this
> > >>>>>>>>>>>>> to distributedlog and want
to know if they make sense
> > >> or
> > >>>>> not.
> > >>>>>> If
> > >>>>>>>>> they
> > >>>>>>>>>>>> are
> > >>>>>>>>>>>>> good, we'd like to contribute
and develop with the
> > >>>>> community.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Here are the thoughts:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -------------------------------------------------
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> From our understanding, DL
can provide "at-least-once"
> > >>>>>> delivery
> > >>>>>>>>>> semantic
> > >>>>>>>>>>>>> (if not, please correct me)
but not "exactly-once"
> > >>>> delivery
> > >>>>>>>>> semantic.
> > >>>>>>>>>>>> That
> > >>>>>>>>>>>>> means that a message can be
delivered one or more times
> > >>>> if
> > >>>>> the
> > >>>>>>>>> reader
> > >>>>>>>>>>>>> doesn't handle duplicates.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The duplicates come from two
places, one is at writer
> > >>>> side
> > >>>>>> (this
> > >>>>>>>>>> assumes
> > >>>>>>>>>>>>> using write proxy not the core
library), while the
> > >> other
> > >>>> one
> > >>>>>> is
> > >>>>>>> at
> > >>>>>>>>>>>> reader
> > >>>>>>>>>>>>> side.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - writer side: if the client
attempts to write a record
> > >>>> to
> > >>>>> the
> > >>>>>>>> write
> > >>>>>>>>>>>>> proxies and gets a network
error (e.g timeouts) then
> > >>>>> retries,
> > >>>>>>> the
> > >>>>>>>>>>>> retrying
> > >>>>>>>>>>>>> will potentially result in
duplicates.
> > >>>>>>>>>>>>> - reader side:if the reader
reads a message from a
> > >> stream
> > >>>>> and
> > >>>>>>> then
> > >>>>>>>>>>>> crashes,
> > >>>>>>>>>>>>> when the reader restarts it
would restart from last
> > >> known
> > >>>>>>> position
> > >>>>>>>>>>>> (DLSN).
> > >>>>>>>>>>>>> If the reader fails after processing
a record and
> > >> before
> > >>>>>>> recording
> > >>>>>>>>> the
> > >>>>>>>>>>>>> position, the processed record
will be delivered again.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The reader problem can be properly
addressed by making
> > >>>> use
> > >>>>> of
> > >>>>>>> the
> > >>>>>>>>>>>> sequence
> > >>>>>>>>>>>>> numbers of records and doing
proper checkpointing. For
> > >>>>>> example,
> > >>>>>>> in
> > >>>>>>>>>>>>> database, it can checkpoint
the indexed data with the
> > >>>>> sequence
> > >>>>>>>>> number
> > >>>>>>>>>> of
> > >>>>>>>>>>>>> records; in flink, it can checkpoint
the state with the
> > >>>>>> sequence
> > >>>>>>>>>>>> numbers.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The writer problem can be addressed
by implementing an
> > >>>>>>> idempotent
> > >>>>>>>>>>>> writer.
> > >>>>>>>>>>>>> However, an alternative and
more powerful approach is
> > >> to
> > >>>>>> support
> > >>>>>>>>>>>>> transactions.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> *What does transaction mean?*
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> A transaction means a collection
of records can be
> > >>>> written
> > >>>>>>>>>>>> transactionally
> > >>>>>>>>>>>>> within a stream or across multiple
streams. They will
> > >> be
> > >>>>>>> consumed
> > >>>>>>>> by
> > >>>>>>>>>> the
> > >>>>>>>>>>>>> reader together when a transaction
is committed, or
> > >> will
> > >>>>> never
> > >>>>>>> be
> > >>>>>>>>>>>> consumed
> > >>>>>>>>>>>>> by the reader when the transaction
is aborted.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The transaction will expose
following guarantees:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - The reader should not be
exposed to records written
> > >>>> from
> > >>>>>>>>> uncommitted
> > >>>>>>>>>>>>> transactions (mandatory)
> > >>>>>>>>>>>>> - The reader should consume
the records in the
> > >>>> transaction
> > >>>>>>> commit
> > >>>>>>>>>> order
> > >>>>>>>>>>>>> rather than the record written
order (mandatory)
> > >>>>>>>>>>>>> - No duplicated records within
a transaction
> > >> (mandatory)
> > >>>>>>>>>>>>> - Allow interleaving transactional
writes and
> > >>>>>> non-transactional
> > >>>>>>>>> writes
> > >>>>>>>>>>>>> (optional)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> *Stream Transaction & Namespace
Transaction*
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> There will be two types of
transaction, one is Stream
> > >>>> level
> > >>>>>>>>>> transaction
> > >>>>>>>>>>>>> (local transaction), while
the other one is Namespace
> > >>>> level
> > >>>>>>>>>> transaction
> > >>>>>>>>>>>>> (global transaction).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The stream level transaction
is a transactional
> > >>>> operation on
> > >>>>>>>> writing
> > >>>>>>>>>>>>> records to one stream; the
namespace level transaction
> > >>>> is a
> > >>>>>>>>>>>> transactional
> > >>>>>>>>>>>>> operation on writing records
to multiple streams.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> *Implementation Thoughts*
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - A transaction is consist
of begin control record, a
> > >>>> series
> > >>>>>> of
> > >>>>>>>> data
> > >>>>>>>>>>>>> records and commit/abort control
record.
> > >>>>>>>>>>>>> - The begin/commit/abort control
record is written to a
> > >>>>>> `commit`
> > >>>>>>>> log
> > >>>>>>>>>>>>> stream, while the data records
will be written to
> > >> normal
> > >>>>> data
> > >>>>>>> log
> > >>>>>>>>>>>> streams.
> > >>>>>>>>>>>>> - The `commit` log stream will
be the same log stream
> > >> for
> > >>>>>>>>> stream-level
> > >>>>>>>>>>>>> transaction,  while it will
be a *system* stream (or
> > >>>>> multiple
> > >>>>>>>> system
> > >>>>>>>>>>>>> streams) for namespace-level
transactions.
> > >>>>>>>>>>>>> - The transaction code looks
like as below:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> <code>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Transaction txn = client.transaction();
> > >>>>>>>>>>>>> Future<DLSN> result1
= txn.write(stream-0, record);
> > >>>>>>>>>>>>> Future<DLSN> result2
= txn.write(stream-1, record);
> > >>>>>>>>>>>>> Future<DLSN> result3
= txn.write(stream-2, record);
> > >>>>>>>>>>>>> Future<Pair<DLSN, DLSN>>
result = txn.commit();
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> </code>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> if the txn is committed, all
the write futures will be
> > >>>>>> satisfied
> > >>>>>>>>> with
> > >>>>>>>>>>>> their
> > >>>>>>>>>>>>> written DLSNs. if the txn is
aborted, all the write
> > >>>> futures
> > >>>>>> will
> > >>>>>>>> be
> > >>>>>>>>>>>> failed
> > >>>>>>>>>>>>> together. there is no partial
failure state.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - The actually data flow will
be:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 1. writer get a transaction
id from the owner of the
> > >>>>> `commit'
> > >>>>>>> log
> > >>>>>>>>>> stream
> > >>>>>>>>>>>>> 1. write the begin control
record (synchronously) with
> > >>>> the
> > >>>>>>>>> transaction
> > >>>>>>>>>>>> id
> > >>>>>>>>>>>>> 2. for each write within the
same txn, it will be
> > >>>> assigned a
> > >>>>>>> local
> > >>>>>>>>>>>> sequence
> > >>>>>>>>>>>>> number starting from 0. the
combination of transaction
> > >> id
> > >>>>> and
> > >>>>>>>> local
> > >>>>>>>>>>>>> sequence number will be used
later on by the readers to
> > >>>>>>>> de-duplicate
> > >>>>>>>>>>>>> records.
> > >>>>>>>>>>>>> 3. the commit/abort control
record will be written
> > >> based
> > >>>> on
> > >>>>>> the
> > >>>>>>>>>> results
> > >>>>>>>>>>>>> from 2.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Application can supply a
timeout for the transaction
> > >>>> when
> > >>>>>>>>> #begin() a
> > >>>>>>>>>>>>> transaction. The owner of the
`commit` log stream can
> > >>>> abort
> > >>>>>>>>>> transactions
> > >>>>>>>>>>>>> that never be committed/aborted
within their timeout.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Failures:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> * all the log records can be
simply retried as they
> > >> will
> > >>>> be
> > >>>>>>>>>>>> de-duplicated
> > >>>>>>>>>>>>> probably at the reader side.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Reader:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> * Reader can be configured
to read uncommitted records
> > >> or
> > >>>>>>>> committed
> > >>>>>>>>>>>> records
> > >>>>>>>>>>>>> only (by default read uncommitted
records)
> > >>>>>>>>>>>>> * If reader is configured to
read committed records
> > >> only,
> > >>>>> the
> > >>>>>>> read
> > >>>>>>>>>> ahead
> > >>>>>>>>>>>>> cache will be changed to maintain
one additional
> > >> pending
> > >>>>>>> committed
> > >>>>>>>>>>>> records.
> > >>>>>>>>>>>>> the pending committed records
map is bounded and
> > >> records
> > >>>>> will
> > >>>>>> be
> > >>>>>>>>>> dropped
> > >>>>>>>>>>>>> when read ahead is moving.
> > >>>>>>>>>>>>> * when the reader hits a commit
record, it will rewind
> > >> to
> > >>>>> the
> > >>>>>>>> begin
> > >>>>>>>>>>>> record
> > >>>>>>>>>>>>> and start reading from there.
leveraging the proper
> > >> read
> > >>>>> ahead
> > >>>>>>>> cache
> > >>>>>>>>>> and
> > >>>>>>>>>>>>> pending commit records cache,
it would be good for both
> > >>>>> short
> > >>>>>>>>>>>> transactions
> > >>>>>>>>>>>>> and long transactions.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - DLSN, SequenceId:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> * We will add a fourth field
to DLSN. It is `local
> > >>>> sequence
> > >>>>>>>> number`
> > >>>>>>>>>>>> within
> > >>>>>>>>>>>>> a transaction session. So the
new DLSN of records in a
> > >>>>>>> transaction
> > >>>>>>>>>> will
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>> the DLSN of commit control
record plus its local
> > >> sequence
> > >>>>>>> number.
> > >>>>>>>>>>>>> * The sequence id will be still
the position of the
> > >>>> commit
> > >>>>>>> record
> > >>>>>>>>> plus
> > >>>>>>>>>>>> its
> > >>>>>>>>>>>>> local sequence number. The
position will be advanced
> > >> with
> > >>>>>> total
> > >>>>>>>>> number
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>> written records on writing
the commit control record.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Transaction Group & Namespace
Transaction
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> using one single log stream
for namespace transaction
> > >> can
> > >>>>>> cause
> > >>>>>>>> the
> > >>>>>>>>>>>>> bottleneck problem since all
the begin/commit/end
> > >> control
> > >>>>>>> records
> > >>>>>>>>> will
> > >>>>>>>>>>>> have
> > >>>>>>>>>>>>> to go through one log stream.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> the idea of 'transaction group'
is to allow
> > >> partitioning
> > >>>> the
> > >>>>>>>> writers
> > >>>>>>>>>>>> into
> > >>>>>>>>>>>>> different transaction groups.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> clients can specify the `group-name`
when starting the
> > >>>>>>>> transaction.
> > >>>>>>>>> if
> > >>>>>>>>>>>>> there is no `group-name` specified,
it will use the
> > >>>> default
> > >>>>>>>> `commit`
> > >>>>>>>>>>>> log in
> > >>>>>>>>>>>>> the namespace for creating
transactions.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -------------------------------------------------
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I'd like to collect feedbacks
on this idea. Appreciate
> > >>>> any
> > >>>>>>>> comments
> > >>>>>>>>>> and
> > >>>>>>>>>>>> if
> > >>>>>>>>>>>>> anyone is also interested in
this idea, we'd like to
> > >>>>>> collaborate
> > >>>>>>>>> with
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>> community.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - Xi
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message