asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Jacobs <sjaco...@ucr.edu>
Subject Re: MultiTransactionJobletEventListenerFactory
Date Fri, 17 Nov 2017 19:32:46 GMT
Well, we've solved the problem when there is only one transaction id per
job. The operators can fetch the transaction ids from the
JobEventListenerFactory (you can find this in master now). The issue is,
when we are trying to combine multiple job specs into one feed job, the
operators at runtime don't have a memory of which "job spec" they
originally belonged to which could tell them which one of the transaction
ids that they should use.

Steven

On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <bamousaa@gmail.com>
wrote:

>
> I think that this works and seems like the question is how different
> operators in the job can get their transaction ids.
>
> ~Abdullah.
>
> > On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
> >
> > From the conversation, it seems like nobody has the full picture to
> propose
> > the design?
> > For deployed jobs, the idea is to use the same job specification but
> create
> > a new Hyracks job and Asterix Transaction for each execution.
> >
> > Steven
> >
> > On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <bamousaa@gmail.com>
> > wrote:
> >
> >> I can e-meet anytime (moved to Sunnyvale). We can also look at a
> proposed
> >> design and see if it can work
> >> Back to my question, how were you planning to change the transaction id
> if
> >> we forget about the case with multiple datasets (feed job)?
> >>
> >>
> >>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
> >>>
> >>> Maybe it would be good to have a meeting about this with all interested
> >>> parties?
> >>>
> >>> I can be on-campus at UCI on Tuesday if that would be a good day to
> meet.
> >>>
> >>> Steven
> >>>
> >>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <bamousaa@gmail.com
> >
> >>> wrote:
> >>>
> >>>> Also, was wondering how would you do the same for a single dataset
> >>>> (non-feed). How would you get the transaction id and change it when
> you
> >>>> re-run?
> >>>>
> >>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hubailmor@gmail.com>
> wrote:
> >>>>
> >>>>> For atomic transactions, the change was merged yesterday. For entity
> >>>> level
> >>>>> transactions, it should be a very small change.
> >>>>>
> >>>>> Cheers,
> >>>>> Murtadha
> >>>>>
> >>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <bamousaa@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> I understand that is not the case right now but what you're
working
> >> on?
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Abdullah.
> >>>>>>
> >>>>>>
> >>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hubailmor@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> A transaction context can register multiple primary indexes.
> >>>>>>> Since each entity commit log contains the dataset id, you
can
> >>>> decrement
> >>>>> the active operations on
> >>>>>>> the operation tracker associated with that dataset id.
> >>>>>>>
> >>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <bamousaa@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>> Can you illustrate how a deadlock can happen? I am anxious
to know.
> >>>>>>> Moreover, the reason for the multiple transaction ids in
feeds is
> >>>> not
> >>>>> simply because we compile them differently.
> >>>>>>>
> >>>>>>> How would a commit operator know which dataset active operation
> >>>>> counter to decrement if they share the same id for example?
> >>>>>>>
> >>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xikuiw@uci.edu>
wrote:
> >>>>>>>>
> >>>>>>>> Yes. That deadlock could happen. Currently, we have
one-to-one
> >>>>> mappings for
> >>>>>>>> the jobs and transactions, except for the feeds.
> >>>>>>>>
> >>>>>>>> @Abdullah, after some digging into the code, I think
probably we
> can
> >>>>> use a
> >>>>>>>> single transaction id for the job which feeds multiple
datasets?
> See
> >>>>> if I
> >>>>>>>> can convince you. :)
> >>>>>>>>
> >>>>>>>> The reason we have multiple transaction ids in feeds
is that we
> >>>> compile
> >>>>>>>> each connection job separately and combine them into
a single feed
> >>>>> job. A
> >>>>>>>> new transaction id is created and assigned to each connection
job,
> >>>>> thus for
> >>>>>>>> the combined job, we have to handle the different transactions
as
> >>>> they
> >>>>>>>> are embedded in the connection job specifications. But,
what if we
> >>>>> create a
> >>>>>>>> single transaction id for the combined job? That transaction
id
> will
> >>>> be
> >>>>>>>> embedded into each connection so they can write logs
freely, but
> the
> >>>>>>>> transaction will be started and committed only once
as there is
> only
> >>>>> one
> >>>>>>>> feed job. In this way, we won't need
> multiTransactionJobletEventLis
> >>>>> tener
> >>>>>>>> and the transaction id can be removed from the job specification
> >>>>> easily as
> >>>>>>>> well (for Steven's change).
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Xikui
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dtabass@gmail.com>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I worry about deadlocks.  The waits for graph may
not understand
> >>>> that
> >>>>>>>>> making t1 wait will also make t2 wait since they
may share a
> thread
> >>>> -
> >>>>>>>>> right?  Or do we have jobs and transactions separately
> represented
> >>>>> there
> >>>>>>>>> now?
> >>>>>>>>>
> >>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi"
<
> bamousaa@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> We are using multiple transactions in a single
job in case of
> feed
> >>>>> and I
> >>>>>>>>>> think that this is the correct way.
> >>>>>>>>>> Having a single job for a feed that feeds into
multiple datasets
> >>>> is a
> >>>>>>>>> good
> >>>>>>>>>> thing since job resources/feed resources are
consolidated.
> >>>>>>>>>>
> >>>>>>>>>> Here are some points:
> >>>>>>>>>> - We can't use the same transaction id to feed
multiple
> datasets.
> >>>> The
> >>>>>>>>> only
> >>>>>>>>>> other option is to have multiple jobs each feeding
a different
> >>>>> dataset.
> >>>>>>>>>> - Having multiple jobs (in addition to the extra
resources used,
> >>>>> memory
> >>>>>>>>>> and CPU) would then forces us to either read
data from external
> >>>>> sources
> >>>>>>>>>> multiple times, parse records multiple times,
etc
> >>>>>>>>>> or having to have a synchronization between
the different jobs
> and
> >>>>> the
> >>>>>>>>>> feed source within asterixdb. IMO, this is far
more complicated
> >>>> than
> >>>>>>>>> having
> >>>>>>>>>> multiple transactions within a single job and
the cost far
> >> outweigh
> >>>>> the
> >>>>>>>>>> benefits.
> >>>>>>>>>>
> >>>>>>>>>> P.S,
> >>>>>>>>>> We are also using this for bucket connections
in Couchbase
> >>>> Analytics.
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann
<tillw@apache.org>
> >>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> If there are a number of issue with supporting
multiple
> >>>> transaction
> >>>>> ids
> >>>>>>>>>>> and no clear benefits/use-cases, I’d vote
for simplification :)
> >>>>>>>>>>> Also, code that’s not being used has a
tendency to "rot" and
> so I
> >>>>> think
> >>>>>>>>>>> that it’s usefulness might be limited
by the time we’d find a
> use
> >>>>> for
> >>>>>>>>>>> this functionality.
> >>>>>>>>>>>
> >>>>>>>>>>> My 2c,
> >>>>>>>>>>> Till
> >>>>>>>>>>>
> >>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang
wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm separating the connections into
different jobs in some of
> my
> >>>>>>>>>>>> experiments... but that was intended
to be used for the
> >>>>> experimental
> >>>>>>>>>>>> settings (i.e., not for master now)...
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think the interesting question here
is whether we want to
> >> allow
> >>>>> one
> >>>>>>>>>>>> Hyracks job to carry multiple transactions.
I personally think
> >>>> that
> >>>>>>>>>> should
> >>>>>>>>>>>> be allowed as the transaction and job
are two separate
> concepts,
> >>>>> but I
> >>>>>>>>>>>> couldn't find such use cases other than
the feeds. Does anyone
> >>>>> have a
> >>>>>>>>>> good
> >>>>>>>>>>>> example on this?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Another question is, if we do allow
multiple transactions in a
> >>>>> single
> >>>>>>>>>>>> Hyracks job, how do we enable commit
runtime to obtain the
> >>>> correct
> >>>>> TXN
> >>>>>>>>>> id
> >>>>>>>>>>>> without having that embedded as part
of the job specification.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Xikui
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah
alamoudi <
> >>>>>>>>> bamousaa@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I am curious as to how feed will
work without this?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ~Abdullah.
> >>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM,
Steven Jacobs <
> sjaco002@ucr.edu
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
> tenerFactory,
> >>>>> which
> >>>>>>>>>>>>> allows
> >>>>>>>>>>>>>> for one Hyracks job to run multiple
Asterix transactions
> >>>>> together.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This class is only used by feeds,
and feeds are in process
> of
> >>>>>>>>>> changing to
> >>>>>>>>>>>>>> no longer need this feature.
As part of the work in
> >>>> pre-deploying
> >>>>>>>>> job
> >>>>>>>>>>>>>> specifications to be used by
multiple hyracks jobs, I've
> been
> >>>>>>>>> working
> >>>>>>>>>> on
> >>>>>>>>>>>>>> removing the transaction id
from the job specifications, as
> we
> >>>>> use a
> >>>>>>>>>> new
> >>>>>>>>>>>>>> transaction for each invocation
of a deployed job.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> There is currently no clear
way to remove the transaction id
> >>>> from
> >>>>>>>>> the
> >>>>>>>>>> job
> >>>>>>>>>>>>>> spec and keep the option for
MultiTransactionJobletEventLis
> >>>>>>>>>> tenerFactory.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The question for the group is,
do we see a need to maintain
> >>>> this
> >>>>>>>>> class
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>> will no longer be used by any
current code? Or, an other
> >> words,
> >>>>> is
> >>>>>>>>>> there
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>> strong possibility that in the
future we will want multiple
> >>>>>>>>>> transactions
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> share a single Hyracks job,
meaning that it is worth
> figuring
> >>>> out
> >>>>>>>>> how
> >>>>>>>>>> to
> >>>>>>>>>>>>>> maintain this class?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Steven
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message