asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Jacobs <sjaco...@ucr.edu>
Subject Re: MultiTransactionJobletEventListenerFactory
Date Fri, 17 Nov 2017 19:21:22 GMT
>From the conversation, it seems like nobody has the full picture to propose
the design?
For deployed jobs, the idea is to use the same job specification but create
a new Hyracks job and Asterix Transaction for each execution.

Steven

On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <bamousaa@gmail.com>
wrote:

> I can e-meet anytime (moved to Sunnyvale). We can also look at a proposed
> design and see if it can work
> Back to my question, how were you planning to change the transaction id if
> we forget about the case with multiple datasets (feed job)?
>
>
> > On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
> >
> > Maybe it would be good to have a meeting about this with all interested
> > parties?
> >
> > I can be on-campus at UCI on Tuesday if that would be a good day to meet.
> >
> > Steven
> >
> > On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <bamousaa@gmail.com>
> > wrote:
> >
> >> Also, was wondering how would you do the same for a single dataset
> >> (non-feed). How would you get the transaction id and change it when you
> >> re-run?
> >>
> >> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hubailmor@gmail.com> wrote:
> >>
> >>> For atomic transactions, the change was merged yesterday. For entity
> >> level
> >>> transactions, it should be a very small change.
> >>>
> >>> Cheers,
> >>> Murtadha
> >>>
> >>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <bamousaa@gmail.com>
> >>> wrote:
> >>>>
> >>>> I understand that is not the case right now but what you're working
> on?
> >>>>
> >>>> Cheers,
> >>>> Abdullah.
> >>>>
> >>>>
> >>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hubailmor@gmail.com>
> >>> wrote:
> >>>>>
> >>>>> A transaction context can register multiple primary indexes.
> >>>>> Since each entity commit log contains the dataset id, you can
> >> decrement
> >>> the active operations on
> >>>>> the operation tracker associated with that dataset id.
> >>>>>
> >>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <bamousaa@gmail.com>
> >> wrote:
> >>>>>
> >>>>>  Can you illustrate how a deadlock can happen? I am anxious to know.
> >>>>>  Moreover, the reason for the multiple transaction ids in feeds
is
> >> not
> >>> simply because we compile them differently.
> >>>>>
> >>>>>  How would a commit operator know which dataset active operation
> >>> counter to decrement if they share the same id for example?
> >>>>>
> >>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xikuiw@uci.edu>
wrote:
> >>>>>>
> >>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
> >>> mappings for
> >>>>>> the jobs and transactions, except for the feeds.
> >>>>>>
> >>>>>> @Abdullah, after some digging into the code, I think probably
we can
> >>> use a
> >>>>>> single transaction id for the job which feeds multiple datasets?
See
> >>> if I
> >>>>>> can convince you. :)
> >>>>>>
> >>>>>> The reason we have multiple transaction ids in feeds is that
we
> >> compile
> >>>>>> each connection job separately and combine them into a single
feed
> >>> job. A
> >>>>>> new transaction id is created and assigned to each connection
job,
> >>> thus for
> >>>>>> the combined job, we have to handle the different transactions
as
> >> they
> >>>>>> are embedded in the connection job specifications. But, what
if we
> >>> create a
> >>>>>> single transaction id for the combined job? That transaction
id will
> >> be
> >>>>>> embedded into each connection so they can write logs freely,
but the
> >>>>>> transaction will be started and committed only once as there
is only
> >>> one
> >>>>>> feed job. In this way, we won't need multiTransactionJobletEventLis
> >>> tener
> >>>>>> and the transaction id can be removed from the job specification
> >>> easily as
> >>>>>> well (for Steven's change).
> >>>>>>
> >>>>>> Best,
> >>>>>> Xikui
> >>>>>>
> >>>>>>
> >>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dtabass@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>> I worry about deadlocks.  The waits for graph may not understand
> >> that
> >>>>>>> making t1 wait will also make t2 wait since they may share
a thread
> >> -
> >>>>>>> right?  Or do we have jobs and transactions separately represented
> >>> there
> >>>>>>> now?
> >>>>>>>
> >>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <bamousaa@gmail.com>
> >>> wrote:
> >>>>>>>>
> >>>>>>>> We are using multiple transactions in a single job in
case of feed
> >>> and I
> >>>>>>>> think that this is the correct way.
> >>>>>>>> Having a single job for a feed that feeds into multiple
datasets
> >> is a
> >>>>>>> good
> >>>>>>>> thing since job resources/feed resources are consolidated.
> >>>>>>>>
> >>>>>>>> Here are some points:
> >>>>>>>> - We can't use the same transaction id to feed multiple
datasets.
> >> The
> >>>>>>> only
> >>>>>>>> other option is to have multiple jobs each feeding a
different
> >>> dataset.
> >>>>>>>> - Having multiple jobs (in addition to the extra resources
used,
> >>> memory
> >>>>>>>> and CPU) would then forces us to either read data from
external
> >>> sources
> >>>>>>>> multiple times, parse records multiple times, etc
> >>>>>>>> or having to have a synchronization between the different
jobs and
> >>> the
> >>>>>>>> feed source within asterixdb. IMO, this is far more
complicated
> >> than
> >>>>>>> having
> >>>>>>>> multiple transactions within a single job and the cost
far
> outweigh
> >>> the
> >>>>>>>> benefits.
> >>>>>>>>
> >>>>>>>> P.S,
> >>>>>>>> We are also using this for bucket connections in Couchbase
> >> Analytics.
> >>>>>>>>
> >>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <tillw@apache.org>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>> If there are a number of issue with supporting multiple
> >> transaction
> >>> ids
> >>>>>>>>> and no clear benefits/use-cases, I’d vote for
simplification :)
> >>>>>>>>> Also, code that’s not being used has a tendency
to "rot" and so I
> >>> think
> >>>>>>>>> that it’s usefulness might be limited by the time
we’d find a use
> >>> for
> >>>>>>>>> this functionality.
> >>>>>>>>>
> >>>>>>>>> My 2c,
> >>>>>>>>> Till
> >>>>>>>>>
> >>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >>>>>>>>>>
> >>>>>>>>>> I'm separating the connections into different
jobs in some of my
> >>>>>>>>>> experiments... but that was intended to be used
for the
> >>> experimental
> >>>>>>>>>> settings (i.e., not for master now)...
> >>>>>>>>>>
> >>>>>>>>>> I think the interesting question here is whether
we want to
> allow
> >>> one
> >>>>>>>>>> Hyracks job to carry multiple transactions.
I personally think
> >> that
> >>>>>>>> should
> >>>>>>>>>> be allowed as the transaction and job are two
separate concepts,
> >>> but I
> >>>>>>>>>> couldn't find such use cases other than the
feeds. Does anyone
> >>> have a
> >>>>>>>> good
> >>>>>>>>>> example on this?
> >>>>>>>>>>
> >>>>>>>>>> Another question is, if we do allow multiple
transactions in a
> >>> single
> >>>>>>>>>> Hyracks job, how do we enable commit runtime
to obtain the
> >> correct
> >>> TXN
> >>>>>>>> id
> >>>>>>>>>> without having that embedded as part of the
job specification.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Xikui
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi
<
> >>>>>>> bamousaa@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I am curious as to how feed will work without
this?
> >>>>>>>>>>>
> >>>>>>>>>>> ~Abdullah.
> >>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven
Jacobs <sjaco002@ucr.edu
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
> >>> which
> >>>>>>>>>>> allows
> >>>>>>>>>>>> for one Hyracks job to run multiple
Asterix transactions
> >>> together.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This class is only used by feeds, and
feeds are in process of
> >>>>>>>> changing to
> >>>>>>>>>>>> no longer need this feature. As part
of the work in
> >> pre-deploying
> >>>>>>> job
> >>>>>>>>>>>> specifications to be used by multiple
hyracks jobs, I've been
> >>>>>>> working
> >>>>>>>> on
> >>>>>>>>>>>> removing the transaction id from the
job specifications, as we
> >>> use a
> >>>>>>>> new
> >>>>>>>>>>>> transaction for each invocation of a
deployed job.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is currently no clear way to remove
the transaction id
> >> from
> >>>>>>> the
> >>>>>>>> job
> >>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
> >>>>>>>> tenerFactory.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The question for the group is, do we
see a need to maintain
> >> this
> >>>>>>> class
> >>>>>>>>>>> that
> >>>>>>>>>>>> will no longer be used by any current
code? Or, an other
> words,
> >>> is
> >>>>>>>> there
> >>>>>>>>>>> a
> >>>>>>>>>>>> strong possibility that in the future
we will want multiple
> >>>>>>>> transactions
> >>>>>>>>>>> to
> >>>>>>>>>>>> share a single Hyracks job, meaning
that it is worth figuring
> >> out
> >>>>>>> how
> >>>>>>>> to
> >>>>>>>>>>>> maintain this class?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Steven
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message