asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: MultiTransactionJobletEventListenerFactory
Date Fri, 17 Nov 2017 19:10:32 GMT
I can e-meet anytime (moved to Sunnyvale). We can also look at a proposed design and see if
it can work
Back to my question, how were you planning to change the transaction id if we forget about
the case with multiple datasets (feed job)?


> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sjaco002@ucr.edu> wrote:
> 
> Maybe it would be good to have a meeting about this with all interested
> parties?
> 
> I can be on-campus at UCI on Tuesday if that would be a good day to meet.
> 
> Steven
> 
> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <bamousaa@gmail.com>
> wrote:
> 
>> Also, was wondering how would you do the same for a single dataset
>> (non-feed). How would you get the transaction id and change it when you
>> re-run?
>> 
>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hubailmor@gmail.com> wrote:
>> 
>>> For atomic transactions, the change was merged yesterday. For entity
>> level
>>> transactions, it should be a very small change.
>>> 
>>> Cheers,
>>> Murtadha
>>> 
>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <bamousaa@gmail.com>
>>> wrote:
>>>> 
>>>> I understand that is not the case right now but what you're working on?
>>>> 
>>>> Cheers,
>>>> Abdullah.
>>>> 
>>>> 
>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hubailmor@gmail.com>
>>> wrote:
>>>>> 
>>>>> A transaction context can register multiple primary indexes.
>>>>> Since each entity commit log contains the dataset id, you can
>> decrement
>>> the active operations on
>>>>> the operation tracker associated with that dataset id.
>>>>> 
>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <bamousaa@gmail.com>
>> wrote:
>>>>> 
>>>>>  Can you illustrate how a deadlock can happen? I am anxious to know.
>>>>>  Moreover, the reason for the multiple transaction ids in feeds is
>> not
>>> simply because we compile them differently.
>>>>> 
>>>>>  How would a commit operator know which dataset active operation
>>> counter to decrement if they share the same id for example?
>>>>> 
>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xikuiw@uci.edu> wrote:
>>>>>> 
>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
>>> mappings for
>>>>>> the jobs and transactions, except for the feeds.
>>>>>> 
>>>>>> @Abdullah, after some digging into the code, I think probably we
can
>>> use a
>>>>>> single transaction id for the job which feeds multiple datasets?
See
>>> if I
>>>>>> can convince you. :)
>>>>>> 
>>>>>> The reason we have multiple transaction ids in feeds is that we
>> compile
>>>>>> each connection job separately and combine them into a single feed
>>> job. A
>>>>>> new transaction id is created and assigned to each connection job,
>>> thus for
>>>>>> the combined job, we have to handle the different transactions as
>> they
>>>>>> are embedded in the connection job specifications. But, what if we
>>> create a
>>>>>> single transaction id for the combined job? That transaction id will
>> be
>>>>>> embedded into each connection so they can write logs freely, but
the
>>>>>> transaction will be started and committed only once as there is only
>>> one
>>>>>> feed job. In this way, we won't need multiTransactionJobletEventLis
>>> tener
>>>>>> and the transaction id can be removed from the job specification
>>> easily as
>>>>>> well (for Steven's change).
>>>>>> 
>>>>>> Best,
>>>>>> Xikui
>>>>>> 
>>>>>> 
>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dtabass@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> I worry about deadlocks.  The waits for graph may not understand
>> that
>>>>>>> making t1 wait will also make t2 wait since they may share a
thread
>> -
>>>>>>> right?  Or do we have jobs and transactions separately represented
>>> there
>>>>>>> now?
>>>>>>> 
>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <bamousaa@gmail.com>
>>> wrote:
>>>>>>>> 
>>>>>>>> We are using multiple transactions in a single job in case
of feed
>>> and I
>>>>>>>> think that this is the correct way.
>>>>>>>> Having a single job for a feed that feeds into multiple datasets
>> is a
>>>>>>> good
>>>>>>>> thing since job resources/feed resources are consolidated.
>>>>>>>> 
>>>>>>>> Here are some points:
>>>>>>>> - We can't use the same transaction id to feed multiple datasets.
>> The
>>>>>>> only
>>>>>>>> other option is to have multiple jobs each feeding a different
>>> dataset.
>>>>>>>> - Having multiple jobs (in addition to the extra resources
used,
>>> memory
>>>>>>>> and CPU) would then forces us to either read data from external
>>> sources
>>>>>>>> multiple times, parse records multiple times, etc
>>>>>>>> or having to have a synchronization between the different
jobs and
>>> the
>>>>>>>> feed source within asterixdb. IMO, this is far more complicated
>> than
>>>>>>> having
>>>>>>>> multiple transactions within a single job and the cost far
outweigh
>>> the
>>>>>>>> benefits.
>>>>>>>> 
>>>>>>>> P.S,
>>>>>>>> We are also using this for bucket connections in Couchbase
>> Analytics.
>>>>>>>> 
>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <tillw@apache.org>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> If there are a number of issue with supporting multiple
>> transaction
>>> ids
>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification
:)
>>>>>>>>> Also, code that’s not being used has a tendency to
"rot" and so I
>>> think
>>>>>>>>> that it’s usefulness might be limited by the time we’d
find a use
>>> for
>>>>>>>>> this functionality.
>>>>>>>>> 
>>>>>>>>> My 2c,
>>>>>>>>> Till
>>>>>>>>> 
>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>>>>> 
>>>>>>>>>> I'm separating the connections into different jobs
in some of my
>>>>>>>>>> experiments... but that was intended to be used for
the
>>> experimental
>>>>>>>>>> settings (i.e., not for master now)...
>>>>>>>>>> 
>>>>>>>>>> I think the interesting question here is whether
we want to allow
>>> one
>>>>>>>>>> Hyracks job to carry multiple transactions. I personally
think
>> that
>>>>>>>> should
>>>>>>>>>> be allowed as the transaction and job are two separate
concepts,
>>> but I
>>>>>>>>>> couldn't find such use cases other than the feeds.
Does anyone
>>> have a
>>>>>>>> good
>>>>>>>>>> example on this?
>>>>>>>>>> 
>>>>>>>>>> Another question is, if we do allow multiple transactions
in a
>>> single
>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain
the
>> correct
>>> TXN
>>>>>>>> id
>>>>>>>>>> without having that embedded as part of the job specification.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Xikui
>>>>>>>>>> 
>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi
<
>>>>>>> bamousaa@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I am curious as to how feed will work without
this?
>>>>>>>>>>> 
>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs
<sjaco002@ucr.edu>
>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
>>> which
>>>>>>>>>>> allows
>>>>>>>>>>>> for one Hyracks job to run multiple Asterix
transactions
>>> together.
>>>>>>>>>>>> 
>>>>>>>>>>>> This class is only used by feeds, and feeds
are in process of
>>>>>>>> changing to
>>>>>>>>>>>> no longer need this feature. As part of the
work in
>> pre-deploying
>>>>>>> job
>>>>>>>>>>>> specifications to be used by multiple hyracks
jobs, I've been
>>>>>>> working
>>>>>>>> on
>>>>>>>>>>>> removing the transaction id from the job
specifications, as we
>>> use a
>>>>>>>> new
>>>>>>>>>>>> transaction for each invocation of a deployed
job.
>>>>>>>>>>>> 
>>>>>>>>>>>> There is currently no clear way to remove
the transaction id
>> from
>>>>>>> the
>>>>>>>> job
>>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>>>>>> tenerFactory.
>>>>>>>>>>>> 
>>>>>>>>>>>> The question for the group is, do we see
a need to maintain
>> this
>>>>>>> class
>>>>>>>>>>> that
>>>>>>>>>>>> will no longer be used by any current code?
Or, an other words,
>>> is
>>>>>>>> there
>>>>>>>>>>> a
>>>>>>>>>>>> strong possibility that in the future we
will want multiple
>>>>>>>> transactions
>>>>>>>>>>> to
>>>>>>>>>>>> share a single Hyracks job, meaning that
it is worth figuring
>> out
>>>>>>> how
>>>>>>>> to
>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>> 
>>>>>>>>>>>> Steven
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Mime
View raw message