asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Jacobs <sjaco...@ucr.edu>
Subject Re: MultiTransactionJobletEventListenerFactory
Date Fri, 17 Nov 2017 19:51:24 GMT
For option 1, I think the dataset id is not a unique identifier. Couldn't
multiple transactions in one job work on the same dataset?

Steven

On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <bamousaa@gmail.com>
wrote:

> So, there are three options to do this:
> 1. Each of these operators work on a a specific dataset. So we can pass
> the datasetId to the JobEventListenerFactory when requesting the
> transaction id.
> 2. We make 1 transaction works for multiple datasets by using a map from
> datasetId to primary opTracker and use it when reporting commits by the log
> flusher thread.
> 3. Prevent a job from having multiple transactions. (For the record, I
> dislike this option since the price we pay is very high IMO)
>
> Cheers,
> Abdullah.
>
> > On Nov 17, 2017, at 11:32 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
> >
> > Well, we've solved the problem when there is only one transaction id per
> > job. The operators can fetch the transaction ids from the
> > JobEventListenerFactory (you can find this in master now). The issue is,
> > when we are trying to combine multiple job specs into one feed job, the
> > operators at runtime don't have a memory of which "job spec" they
> > originally belonged to which could tell them which one of the transaction
> > ids that they should use.
> >
> > Steven
> >
> > On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <bamousaa@gmail.com>
> > wrote:
> >
> >>
> >> I think that this works and seems like the question is how different
> >> operators in the job can get their transaction ids.
> >>
> >> ~Abdullah.
> >>
> >>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
> >>>
> >>> From the conversation, it seems like nobody has the full picture to
> >> propose
> >>> the design?
> >>> For deployed jobs, the idea is to use the same job specification but
> >> create
> >>> a new Hyracks job and Asterix Transaction for each execution.
> >>>
> >>> Steven
> >>>
> >>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> >>> wrote:
> >>>
> >>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
> >> proposed
> >>>> design and see if it can work
> >>>> Back to my question, how were you planning to change the transaction
> id
> >> if
> >>>> we forget about the case with multiple datasets (feed job)?
> >>>>
> >>>>
> >>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sjaco002@ucr.edu>
> wrote:
> >>>>>
> >>>>> Maybe it would be good to have a meeting about this with all
> interested
> >>>>> parties?
> >>>>>
> >>>>> I can be on-campus at UCI on Tuesday if that would be a good day
to
> >> meet.
> >>>>>
> >>>>> Steven
> >>>>>
> >>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
> bamousaa@gmail.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Also, was wondering how would you do the same for a single dataset
> >>>>>> (non-feed). How would you get the transaction id and change
it when
> >> you
> >>>>>> re-run?
> >>>>>>
> >>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hubailmor@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> For atomic transactions, the change was merged yesterday.
For
> entity
> >>>>>> level
> >>>>>>> transactions, it should be a very small change.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Murtadha
> >>>>>>>
> >>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
> bamousaa@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I understand that is not the case right now but what
you're
> working
> >>>> on?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Abdullah.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <
> hubailmor@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> A transaction context can register multiple primary
indexes.
> >>>>>>>>> Since each entity commit log contains the dataset
id, you can
> >>>>>> decrement
> >>>>>>> the active operations on
> >>>>>>>>> the operation tracker associated with that dataset
id.
> >>>>>>>>>
> >>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <bamousaa@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Can you illustrate how a deadlock can happen? I
am anxious to
> know.
> >>>>>>>>> Moreover, the reason for the multiple transaction
ids in feeds is
> >>>>>> not
> >>>>>>> simply because we compile them differently.
> >>>>>>>>>
> >>>>>>>>> How would a commit operator know which dataset active
operation
> >>>>>>> counter to decrement if they share the same id for example?
> >>>>>>>>>
> >>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xikuiw@uci.edu>
wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes. That deadlock could happen. Currently,
we have one-to-one
> >>>>>>> mappings for
> >>>>>>>>>> the jobs and transactions, except for the feeds.
> >>>>>>>>>>
> >>>>>>>>>> @Abdullah, after some digging into the code,
I think probably we
> >> can
> >>>>>>> use a
> >>>>>>>>>> single transaction id for the job which feeds
multiple datasets?
> >> See
> >>>>>>> if I
> >>>>>>>>>> can convince you. :)
> >>>>>>>>>>
> >>>>>>>>>> The reason we have multiple transaction ids
in feeds is that we
> >>>>>> compile
> >>>>>>>>>> each connection job separately and combine them
into a single
> feed
> >>>>>>> job. A
> >>>>>>>>>> new transaction id is created and assigned to
each connection
> job,
> >>>>>>> thus for
> >>>>>>>>>> the combined job, we have to handle the different
transactions
> as
> >>>>>> they
> >>>>>>>>>> are embedded in the connection job specifications.
But, what if
> we
> >>>>>>> create a
> >>>>>>>>>> single transaction id for the combined job?
That transaction id
> >> will
> >>>>>> be
> >>>>>>>>>> embedded into each connection so they can write
logs freely, but
> >> the
> >>>>>>>>>> transaction will be started and committed only
once as there is
> >> only
> >>>>>>> one
> >>>>>>>>>> feed job. In this way, we won't need
> >> multiTransactionJobletEventLis
> >>>>>>> tener
> >>>>>>>>>> and the transaction id can be removed from the
job specification
> >>>>>>> easily as
> >>>>>>>>>> well (for Steven's change).
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Xikui
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey
<dtabass@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I worry about deadlocks.  The waits for
graph may not
> understand
> >>>>>> that
> >>>>>>>>>>> making t1 wait will also make t2 wait since
they may share a
> >> thread
> >>>>>> -
> >>>>>>>>>>> right?  Or do we have jobs and transactions
separately
> >> represented
> >>>>>>> there
> >>>>>>>>>>> now?
> >>>>>>>>>>>
> >>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi"
<
> >> bamousaa@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> We are using multiple transactions in
a single job in case of
> >> feed
> >>>>>>> and I
> >>>>>>>>>>>> think that this is the correct way.
> >>>>>>>>>>>> Having a single job for a feed that
feeds into multiple
> datasets
> >>>>>> is a
> >>>>>>>>>>> good
> >>>>>>>>>>>> thing since job resources/feed resources
are consolidated.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Here are some points:
> >>>>>>>>>>>> - We can't use the same transaction
id to feed multiple
> >> datasets.
> >>>>>> The
> >>>>>>>>>>> only
> >>>>>>>>>>>> other option is to have multiple jobs
each feeding a different
> >>>>>>> dataset.
> >>>>>>>>>>>> - Having multiple jobs (in addition
to the extra resources
> used,
> >>>>>>> memory
> >>>>>>>>>>>> and CPU) would then forces us to either
read data from
> external
> >>>>>>> sources
> >>>>>>>>>>>> multiple times, parse records multiple
times, etc
> >>>>>>>>>>>> or having to have a synchronization
between the different jobs
> >> and
> >>>>>>> the
> >>>>>>>>>>>> feed source within asterixdb. IMO, this
is far more
> complicated
> >>>>>> than
> >>>>>>>>>>> having
> >>>>>>>>>>>> multiple transactions within a single
job and the cost far
> >>>> outweigh
> >>>>>>> the
> >>>>>>>>>>>> benefits.
> >>>>>>>>>>>>
> >>>>>>>>>>>> P.S,
> >>>>>>>>>>>> We are also using this for bucket connections
in Couchbase
> >>>>>> Analytics.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till
Westmann <tillw@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If there are a number of issue with
supporting multiple
> >>>>>> transaction
> >>>>>>> ids
> >>>>>>>>>>>>> and no clear benefits/use-cases,
I’d vote for simplification
> :)
> >>>>>>>>>>>>> Also, code that’s not being used
has a tendency to "rot" and
> >> so I
> >>>>>>> think
> >>>>>>>>>>>>> that it’s usefulness might be
limited by the time we’d find a
> >> use
> >>>>>>> for
> >>>>>>>>>>>>> this functionality.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> My 2c,
> >>>>>>>>>>>>> Till
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui
Wang wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm separating the connections
into different jobs in some
> of
> >> my
> >>>>>>>>>>>>>> experiments... but that was
intended to be used for the
> >>>>>>> experimental
> >>>>>>>>>>>>>> settings (i.e., not for master
now)...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think the interesting question
here is whether we want to
> >>>> allow
> >>>>>>> one
> >>>>>>>>>>>>>> Hyracks job to carry multiple
transactions. I personally
> think
> >>>>>> that
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>> be allowed as the transaction
and job are two separate
> >> concepts,
> >>>>>>> but I
> >>>>>>>>>>>>>> couldn't find such use cases
other than the feeds. Does
> anyone
> >>>>>>> have a
> >>>>>>>>>>>> good
> >>>>>>>>>>>>>> example on this?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Another question is, if we do
allow multiple transactions
> in a
> >>>>>>> single
> >>>>>>>>>>>>>> Hyracks job, how do we enable
commit runtime to obtain the
> >>>>>> correct
> >>>>>>> TXN
> >>>>>>>>>>>> id
> >>>>>>>>>>>>>> without having that embedded
as part of the job
> specification.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Xikui
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01
PM, abdullah alamoudi <
> >>>>>>>>>>> bamousaa@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I am curious as to how feed
will work without this?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> ~Abdullah.
> >>>>>>>>>>>>>>>> On Nov 16, 2017, at
12:43 PM, Steven Jacobs <
> >> sjaco002@ucr.edu
> >>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
> >> tenerFactory,
> >>>>>>> which
> >>>>>>>>>>>>>>> allows
> >>>>>>>>>>>>>>>> for one Hyracks job
to run multiple Asterix transactions
> >>>>>>> together.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This class is only used
by feeds, and feeds are in process
> >> of
> >>>>>>>>>>>> changing to
> >>>>>>>>>>>>>>>> no longer need this
feature. As part of the work in
> >>>>>> pre-deploying
> >>>>>>>>>>> job
> >>>>>>>>>>>>>>>> specifications to be
used by multiple hyracks jobs, I've
> >> been
> >>>>>>>>>>> working
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> removing the transaction
id from the job specifications,
> as
> >> we
> >>>>>>> use a
> >>>>>>>>>>>> new
> >>>>>>>>>>>>>>>> transaction for each
invocation of a deployed job.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> There is currently no
clear way to remove the transaction
> id
> >>>>>> from
> >>>>>>>>>>> the
> >>>>>>>>>>>> job
> >>>>>>>>>>>>>>>> spec and keep the option
for
> MultiTransactionJobletEventLis
> >>>>>>>>>>>> tenerFactory.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The question for the
group is, do we see a need to
> maintain
> >>>>>> this
> >>>>>>>>>>> class
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> will no longer be used
by any current code? Or, an other
> >>>> words,
> >>>>>>> is
> >>>>>>>>>>>> there
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> strong possibility that
in the future we will want
> multiple
> >>>>>>>>>>>> transactions
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> share a single Hyracks
job, meaning that it is worth
> >> figuring
> >>>>>> out
> >>>>>>>>>>> how
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> maintain this class?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Steven
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message