asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: MultiTransactionJobletEventListenerFactory
Date Fri, 17 Nov 2017 19:55:36 GMT
Right now, they can't, so datasetId can be safely used.
> On Nov 17, 2017, at 11:51 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
> 
> For option 1, I think the dataset id is not a unique identifier. Couldn't
> multiple transactions in one job work on the same dataset?
> 
> Steven
> 
> On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <bamousaa@gmail.com>
> wrote:
> 
>> So, there are three options to do this:
>> 1. Each of these operators work on a a specific dataset. So we can pass
>> the datasetId to the JobEventListenerFactory when requesting the
>> transaction id.
>> 2. We make 1 transaction works for multiple datasets by using a map from
>> datasetId to primary opTracker and use it when reporting commits by the log
>> flusher thread.
>> 3. Prevent a job from having multiple transactions. (For the record, I
>> dislike this option since the price we pay is very high IMO)
>> 
>> Cheers,
>> Abdullah.
>> 
>>> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
>>> 
>>> Well, we've solved the problem when there is only one transaction id per
>>> job. The operators can fetch the transaction ids from the
>>> JobEventListenerFactory (you can find this in master now). The issue is,
>>> when we are trying to combine multiple job specs into one feed job, the
>>> operators at runtime don't have a memory of which "job spec" they
>>> originally belonged to which could tell them which one of the transaction
>>> ids that they should use.
>>> 
>>> Steven
>>> 
>>> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <bamousaa@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> I think that this works and seems like the question is how different
>>>> operators in the job can get their transaction ids.
>>>> 
>>>> ~Abdullah.
>>>> 
>>>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sjaco002@ucr.edu>
wrote:
>>>>> 
>>>>> From the conversation, it seems like nobody has the full picture to
>>>> propose
>>>>> the design?
>>>>> For deployed jobs, the idea is to use the same job specification but
>>>> create
>>>>> a new Hyracks job and Asterix Transaction for each execution.
>>>>> 
>>>>> Steven
>>>>> 
>>>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
>> bamousaa@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
>>>> proposed
>>>>>> design and see if it can work
>>>>>> Back to my question, how were you planning to change the transaction
>> id
>>>> if
>>>>>> we forget about the case with multiple datasets (feed job)?
>>>>>> 
>>>>>> 
>>>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sjaco002@ucr.edu>
>> wrote:
>>>>>>> 
>>>>>>> Maybe it would be good to have a meeting about this with all
>> interested
>>>>>>> parties?
>>>>>>> 
>>>>>>> I can be on-campus at UCI on Tuesday if that would be a good
day to
>>>> meet.
>>>>>>> 
>>>>>>> Steven
>>>>>>> 
>>>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
>> bamousaa@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Also, was wondering how would you do the same for a single
dataset
>>>>>>>> (non-feed). How would you get the transaction id and change
it when
>>>> you
>>>>>>>> re-run?
>>>>>>>> 
>>>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hubailmor@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> For atomic transactions, the change was merged yesterday.
For
>> entity
>>>>>>>> level
>>>>>>>>> transactions, it should be a very small change.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Murtadha
>>>>>>>>> 
>>>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
>> bamousaa@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I understand that is not the case right now but what
you're
>> working
>>>>>> on?
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Abdullah.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail
<
>> hubailmor@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> A transaction context can register multiple primary
indexes.
>>>>>>>>>>> Since each entity commit log contains the dataset
id, you can
>>>>>>>> decrement
>>>>>>>>> the active operations on
>>>>>>>>>>> the operation tracker associated with that dataset
id.
>>>>>>>>>>> 
>>>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <bamousaa@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Can you illustrate how a deadlock can happen?
I am anxious to
>> know.
>>>>>>>>>>> Moreover, the reason for the multiple transaction
ids in feeds is
>>>>>>>> not
>>>>>>>>> simply because we compile them differently.
>>>>>>>>>>> 
>>>>>>>>>>> How would a commit operator know which dataset
active operation
>>>>>>>>> counter to decrement if they share the same id for example?
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xikuiw@uci.edu>
wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes. That deadlock could happen. Currently,
we have one-to-one
>>>>>>>>> mappings for
>>>>>>>>>>>> the jobs and transactions, except for the
feeds.
>>>>>>>>>>>> 
>>>>>>>>>>>> @Abdullah, after some digging into the code,
I think probably we
>>>> can
>>>>>>>>> use a
>>>>>>>>>>>> single transaction id for the job which feeds
multiple datasets?
>>>> See
>>>>>>>>> if I
>>>>>>>>>>>> can convince you. :)
>>>>>>>>>>>> 
>>>>>>>>>>>> The reason we have multiple transaction ids
in feeds is that we
>>>>>>>> compile
>>>>>>>>>>>> each connection job separately and combine
them into a single
>> feed
>>>>>>>>> job. A
>>>>>>>>>>>> new transaction id is created and assigned
to each connection
>> job,
>>>>>>>>> thus for
>>>>>>>>>>>> the combined job, we have to handle the different
transactions
>> as
>>>>>>>> they
>>>>>>>>>>>> are embedded in the connection job specifications.
But, what if
>> we
>>>>>>>>> create a
>>>>>>>>>>>> single transaction id for the combined job?
That transaction id
>>>> will
>>>>>>>> be
>>>>>>>>>>>> embedded into each connection so they can
write logs freely, but
>>>> the
>>>>>>>>>>>> transaction will be started and committed
only once as there is
>>>> only
>>>>>>>>> one
>>>>>>>>>>>> feed job. In this way, we won't need
>>>> multiTransactionJobletEventLis
>>>>>>>>> tener
>>>>>>>>>>>> and the transaction id can be removed from
the job specification
>>>>>>>>> easily as
>>>>>>>>>>>> well (for Steven's change).
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Xikui
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike
Carey <dtabass@gmail.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I worry about deadlocks.  The waits for
graph may not
>> understand
>>>>>>>> that
>>>>>>>>>>>>> making t1 wait will also make t2 wait
since they may share a
>>>> thread
>>>>>>>> -
>>>>>>>>>>>>> right?  Or do we have jobs and transactions
separately
>>>> represented
>>>>>>>>> there
>>>>>>>>>>>>> now?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah
alamoudi" <
>>>> bamousaa@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We are using multiple transactions
in a single job in case of
>>>> feed
>>>>>>>>> and I
>>>>>>>>>>>>>> think that this is the correct way.
>>>>>>>>>>>>>> Having a single job for a feed that
feeds into multiple
>> datasets
>>>>>>>> is a
>>>>>>>>>>>>> good
>>>>>>>>>>>>>> thing since job resources/feed resources
are consolidated.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here are some points:
>>>>>>>>>>>>>> - We can't use the same transaction
id to feed multiple
>>>> datasets.
>>>>>>>> The
>>>>>>>>>>>>> only
>>>>>>>>>>>>>> other option is to have multiple
jobs each feeding a different
>>>>>>>>> dataset.
>>>>>>>>>>>>>> - Having multiple jobs (in addition
to the extra resources
>> used,
>>>>>>>>> memory
>>>>>>>>>>>>>> and CPU) would then forces us to
either read data from
>> external
>>>>>>>>> sources
>>>>>>>>>>>>>> multiple times, parse records multiple
times, etc
>>>>>>>>>>>>>> or having to have a synchronization
between the different jobs
>>>> and
>>>>>>>>> the
>>>>>>>>>>>>>> feed source within asterixdb. IMO,
this is far more
>> complicated
>>>>>>>> than
>>>>>>>>>>>>> having
>>>>>>>>>>>>>> multiple transactions within a single
job and the cost far
>>>>>> outweigh
>>>>>>>>> the
>>>>>>>>>>>>>> benefits.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> P.S,
>>>>>>>>>>>>>> We are also using this for bucket
connections in Couchbase
>>>>>>>> Analytics.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM,
Till Westmann <tillw@apache.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If there are a number of issue
with supporting multiple
>>>>>>>> transaction
>>>>>>>>> ids
>>>>>>>>>>>>>>> and no clear benefits/use-cases,
I’d vote for simplification
>> :)
>>>>>>>>>>>>>>> Also, code that’s not being
used has a tendency to "rot" and
>>>> so I
>>>>>>>>> think
>>>>>>>>>>>>>>> that it’s usefulness might
be limited by the time we’d find a
>>>> use
>>>>>>>>> for
>>>>>>>>>>>>>>> this functionality.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> My 2c,
>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57,
Xikui Wang wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm separating the connections
into different jobs in some
>> of
>>>> my
>>>>>>>>>>>>>>>> experiments... but that was
intended to be used for the
>>>>>>>>> experimental
>>>>>>>>>>>>>>>> settings (i.e., not for master
now)...
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think the interesting question
here is whether we want to
>>>>>> allow
>>>>>>>>> one
>>>>>>>>>>>>>>>> Hyracks job to carry multiple
transactions. I personally
>> think
>>>>>>>> that
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> be allowed as the transaction
and job are two separate
>>>> concepts,
>>>>>>>>> but I
>>>>>>>>>>>>>>>> couldn't find such use cases
other than the feeds. Does
>> anyone
>>>>>>>>> have a
>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>> example on this?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Another question is, if we
do allow multiple transactions
>> in a
>>>>>>>>> single
>>>>>>>>>>>>>>>> Hyracks job, how do we enable
commit runtime to obtain the
>>>>>>>> correct
>>>>>>>>> TXN
>>>>>>>>>>>>>> id
>>>>>>>>>>>>>>>> without having that embedded
as part of the job
>> specification.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Xikui
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01
PM, abdullah alamoudi <
>>>>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am curious as to how
feed will work without this?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>>>>>>>> On Nov 16, 2017,
at 12:43 PM, Steven Jacobs <
>>>> sjaco002@ucr.edu
>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>> We currently have
MultiTransactionJobletEventLis
>>>> tenerFactory,
>>>>>>>>> which
>>>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>> for one Hyracks job
to run multiple Asterix transactions
>>>>>>>>> together.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This class is only
used by feeds, and feeds are in process
>>>> of
>>>>>>>>>>>>>> changing to
>>>>>>>>>>>>>>>>>> no longer need this
feature. As part of the work in
>>>>>>>> pre-deploying
>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>> specifications to
be used by multiple hyracks jobs, I've
>>>> been
>>>>>>>>>>>>> working
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> removing the transaction
id from the job specifications,
>> as
>>>> we
>>>>>>>>> use a
>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>> transaction for each
invocation of a deployed job.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There is currently
no clear way to remove the transaction
>> id
>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>> spec and keep the
option for
>> MultiTransactionJobletEventLis
>>>>>>>>>>>>>> tenerFactory.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The question for
the group is, do we see a need to
>> maintain
>>>>>>>> this
>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> will no longer be
used by any current code? Or, an other
>>>>>> words,
>>>>>>>>> is
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> strong possibility
that in the future we will want
>> multiple
>>>>>>>>>>>>>> transactions
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> share a single Hyracks
job, meaning that it is worth
>>>> figuring
>>>>>>>> out
>>>>>>>>>>>>> how
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message