asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: MultiTransactionJobletEventListenerFactory
Date Fri, 17 Nov 2017 15:07:54 GMT
I understand that is not the case right now but what you're working on?

Cheers,
Abdullah.


> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hubailmor@gmail.com> wrote:
> 
> A transaction context can register multiple primary indexes.
> Since each entity commit log contains the dataset id, you can decrement the active operations
on 
> the operation tracker associated with that dataset id.
> 
> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <bamousaa@gmail.com> wrote:
> 
>    Can you illustrate how a deadlock can happen? I am anxious to know.
>    Moreover, the reason for the multiple transaction ids in feeds is not simply because
we compile them differently.
> 
>    How would a commit operator know which dataset active operation counter to decrement
if they share the same id for example?
> 
>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xikuiw@uci.edu> wrote:
>> 
>> Yes. That deadlock could happen. Currently, we have one-to-one mappings for
>> the jobs and transactions, except for the feeds.
>> 
>> @Abdullah, after some digging into the code, I think probably we can use a
>> single transaction id for the job which feeds multiple datasets? See if I
>> can convince you. :)
>> 
>> The reason we have multiple transaction ids in feeds is that we compile
>> each connection job separately and combine them into a single feed job. A
>> new transaction id is created and assigned to each connection job, thus for
>> the combined job, we have to handle the different transactions as they
>> are embedded in the connection job specifications. But, what if we create a
>> single transaction id for the combined job? That transaction id will be
>> embedded into each connection so they can write logs freely, but the
>> transaction will be started and committed only once as there is only one
>> feed job. In this way, we won't need multiTransactionJobletEventListener
>> and the transaction id can be removed from the job specification easily as
>> well (for Steven's change).
>> 
>> Best,
>> Xikui
>> 
>> 
>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dtabass@gmail.com> wrote:
>> 
>>> I worry about deadlocks.  The waits for graph may not understand that
>>> making t1 wait will also make t2 wait since they may share a thread -
>>> right?  Or do we have jobs and transactions separately represented there
>>> now?
>>> 
>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <bamousaa@gmail.com> wrote:
>>> 
>>>> We are using multiple transactions in a single job in case of feed and I
>>>> think that this is the correct way.
>>>> Having a single job for a feed that feeds into multiple datasets is a
>>> good
>>>> thing since job resources/feed resources are consolidated.
>>>> 
>>>> Here are some points:
>>>> - We can't use the same transaction id to feed multiple datasets. The
>>> only
>>>> other option is to have multiple jobs each feeding a different dataset.
>>>> - Having multiple jobs (in addition to the extra resources used, memory
>>>> and CPU) would then forces us to either read data from external sources
>>>> multiple times, parse records multiple times, etc
>>>> or having to have a synchronization between the different jobs and the
>>>> feed source within asterixdb. IMO, this is far more complicated than
>>> having
>>>> multiple transactions within a single job and the cost far outweigh the
>>>> benefits.
>>>> 
>>>> P.S,
>>>> We are also using this for bucket connections in Couchbase Analytics.
>>>> 
>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <tillw@apache.org> wrote:
>>>>> 
>>>>> If there are a number of issue with supporting multiple transaction ids
>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>>> Also, code that’s not being used has a tendency to "rot" and so I think
>>>>> that it’s usefulness might be limited by the time we’d find a use
for
>>>>> this functionality.
>>>>> 
>>>>> My 2c,
>>>>> Till
>>>>> 
>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>> 
>>>>>> I'm separating the connections into different jobs in some of my
>>>>>> experiments... but that was intended to be used for the experimental
>>>>>> settings (i.e., not for master now)...
>>>>>> 
>>>>>> I think the interesting question here is whether we want to allow
one
>>>>>> Hyracks job to carry multiple transactions. I personally think that
>>>> should
>>>>>> be allowed as the transaction and job are two separate concepts,
but I
>>>>>> couldn't find such use cases other than the feeds. Does anyone have
a
>>>> good
>>>>>> example on this?
>>>>>> 
>>>>>> Another question is, if we do allow multiple transactions in a single
>>>>>> Hyracks job, how do we enable commit runtime to obtain the correct
TXN
>>>> id
>>>>>> without having that embedded as part of the job specification.
>>>>>> 
>>>>>> Best,
>>>>>> Xikui
>>>>>> 
>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>> bamousaa@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> I am curious as to how feed will work without this?
>>>>>>> 
>>>>>>> ~Abdullah.
>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sjaco002@ucr.edu>
>>> wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
which
>>>>>>> allows
>>>>>>>> for one Hyracks job to run multiple Asterix transactions
together.
>>>>>>>> 
>>>>>>>> This class is only used by feeds, and feeds are in process
of
>>>> changing to
>>>>>>>> no longer need this feature. As part of the work in pre-deploying
>>> job
>>>>>>>> specifications to be used by multiple hyracks jobs, I've
been
>>> working
>>>> on
>>>>>>>> removing the transaction id from the job specifications,
as we use a
>>>> new
>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>> 
>>>>>>>> There is currently no clear way to remove the transaction
id from
>>> the
>>>> job
>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>> tenerFactory.
>>>>>>>> 
>>>>>>>> The question for the group is, do we see a need to maintain
this
>>> class
>>>>>>> that
>>>>>>>> will no longer be used by any current code? Or, an other
words, is
>>>> there
>>>>>>> a
>>>>>>>> strong possibility that in the future we will want multiple
>>>> transactions
>>>>>>> to
>>>>>>>> share a single Hyracks job, meaning that it is worth figuring
out
>>> how
>>>> to
>>>>>>>> maintain this class?
>>>>>>>> 
>>>>>>>> Steven
>>>>>>> 
>>>>>>> 
>>>> 
>>>> 
>>> 
> 
> 
> 
> 


Mime
View raw message