Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9A060200D42 for ; Fri, 17 Nov 2017 16:04:13 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 98C70160BFB; Fri, 17 Nov 2017 15:04:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B77B9160BF8 for ; Fri, 17 Nov 2017 16:04:12 +0100 (CET) Received: (qmail 91784 invoked by uid 500); 17 Nov 2017 15:04:11 -0000 Mailing-List: contact dev-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list dev@asterixdb.apache.org Received: (qmail 91772 invoked by uid 99); 17 Nov 2017 15:04:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Nov 2017 15:04:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CAEDD1A138B for ; Fri, 17 Nov 2017 15:04:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.12 X-Spam-Level: X-Spam-Status: No, score=-0.12 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, MIME_QP_LONG_LINE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 01Jgtyv4Picn for ; Fri, 17 Nov 2017 15:04:09 +0000 (UTC) Received: from mail-wm0-f45.google.com (mail-wm0-f45.google.com [74.125.82.45]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8DF5C5F569 for ; Fri, 17 Nov 2017 15:04:08 +0000 (UTC) Received: by mail-wm0-f45.google.com with SMTP id z3so7046739wme.3 for ; Fri, 17 Nov 2017 07:04:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=user-agent:date:subject:from:to:message-id:thread-topic:references :in-reply-to:mime-version:content-transfer-encoding; bh=kDJJv2eOg5gxe6K8PeGHKX11+OP+7c1jvLM612inhAw=; b=ZnhNli0cmwq2sKGjbBZWl9hk9dmiB+aPqTQbIFShov6M2TwV3x0pTJPAPbj/6CyiTU kMhuBhiltzzzi994Xsc9alKuKKFLMAyJ71GzqVKw6A+8BYKVnBKfHKei4IulG+AL1iSH 8GbDV1ZzG9CieX5oLvU3V8y+zfUD2ncn0eLDCICOqUjwozsVDbljU4qjW6bJ99301Qc7 qxps3QRAlddhdTbR51rL/vS2vDiUDkZEHEzc7lIwPsp5dw39rlKgLYc8py7cfTtYcRnW 8WHBuRZioMV17Yx8jZRaDeVlaZBWfNgr4+u1VybEsGIdFbnLa4Clib05YaHF5VjvAaGA Sk5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:user-agent:date:subject:from:to:message-id :thread-topic:references:in-reply-to:mime-version :content-transfer-encoding; bh=kDJJv2eOg5gxe6K8PeGHKX11+OP+7c1jvLM612inhAw=; b=IJGXDHxBgyz9rX8jSpuDPCyqFilzE85PVOdgqtGn7R0UsZLBeRLmqxapXe1Abt4r3l j6vVW9FYQBRJ8oCQ8kSy4WikZmOFYlDtsqyi+CiJriP0FGKFp+qYWjGncOTtT+mnkzUK 1ToxMdq0fCRVPXUMHmFseGjfCyUjPgq9vlAxwWeskFFPv3BItG/4FY+egJ45n/PP+9Xc zvTEvUV1lxXzUns3VQGA3B0TePnyiwOd5jDujX85OEFz/tucmGVHgeiz4CmnKF/l5iR/ ubgi61jxboIY2toND1NKqOy5mrSoT6cROgtuULGxLR6lusDFXdaqHCcC5MLUlVLf/5nh 8E2A== X-Gm-Message-State: AJaThX4Z4Flo9X3wcqN1Z4/5SjYsHMQ9pePYaznqvQGStb1nAShsbYI8 /UxEC3s8Qs9NF/JO9W8/44tj7Q== X-Google-Smtp-Source: AGs4zMa5hzyc3wxnzSshX/vzEbyN+9ZWbqcA7YBYnXv5v82fwSxzcRK8uLK4VYesVQKyWC9Tt2UzgQ== X-Received: by 10.80.189.205 with SMTP id z13mr7883174edh.184.1510931047212; Fri, 17 Nov 2017 07:04:07 -0800 (PST) Received: from [192.168.1.8] ([178.87.139.69]) by smtp.gmail.com with ESMTPSA id r15sm3018426edi.52.2017.11.17.07.04.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Nov 2017 07:04:06 -0800 (PST) User-Agent: Microsoft-MacOutlook/f.28.0.171108 Date: Fri, 17 Nov 2017 18:04:02 +0300 Subject: Re: MultiTransactionJobletEventListenerFactory From: Murtadha Hubail To: Message-ID: <321A4920-A635-4F5E-A648-6B437B9CBEF2@gmail.com> Thread-Topic: MultiTransactionJobletEventListenerFactory References: <7F92EF80-97A5-4B6F-BD8C-4C7514B96AD1@apache.org> <132CEC8D-7221-49F6-A355-13431921770A@gmail.com> In-Reply-To: <132CEC8D-7221-49F6-A355-13431921770A@gmail.com> Mime-version: 1.0 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: quoted-printable archived-at: Fri, 17 Nov 2017 15:04:13 -0000 A transaction context can register multiple primary indexes. Since each entity commit log contains the dataset id, you can decrement the= active operations on=20 the operation tracker associated with that dataset id. On 17/11/2017, 5:52 PM, "abdullah alamoudi" wrote: Can you illustrate how a deadlock can happen? I am anxious to know. Moreover, the reason for the multiple transaction ids in feeds is not s= imply because we compile them differently. =20 How would a commit operator know which dataset active operation counter= to decrement if they share the same id for example? =20 > On Nov 16, 2017, at 9:46 PM, Xikui Wang wrote: >=20 > Yes. That deadlock could happen. Currently, we have one-to-one mappin= gs for > the jobs and transactions, except for the feeds. >=20 > @Abdullah, after some digging into the code, I think probably we can = use a > single transaction id for the job which feeds multiple datasets? See = if I > can convince you. :) >=20 > The reason we have multiple transaction ids in feeds is that we compi= le > each connection job separately and combine them into a single feed jo= b. A > new transaction id is created and assigned to each connection job, th= us for > the combined job, we have to handle the different transactions as the= y > are embedded in the connection job specifications. But, what if we cr= eate a > single transaction id for the combined job? That transaction id will = be > embedded into each connection so they can write logs freely, but the > transaction will be started and committed only once as there is only = one > feed job. In this way, we won't need multiTransactionJobletEventListe= ner > and the transaction id can be removed from the job specification easi= ly as > well (for Steven's change). >=20 > Best, > Xikui >=20 >=20 > On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey wrote= : >=20 >> I worry about deadlocks. The waits for graph may not understand tha= t >> making t1 wait will also make t2 wait since they may share a thread = - >> right? Or do we have jobs and transactions separately represented t= here >> now? >>=20 >> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" wr= ote: >>=20 >>> We are using multiple transactions in a single job in case of feed = and I >>> think that this is the correct way. >>> Having a single job for a feed that feeds into multiple datasets is= a >> good >>> thing since job resources/feed resources are consolidated. >>>=20 >>> Here are some points: >>> - We can't use the same transaction id to feed multiple datasets. T= he >> only >>> other option is to have multiple jobs each feeding a different data= set. >>> - Having multiple jobs (in addition to the extra resources used, me= mory >>> and CPU) would then forces us to either read data from external sou= rces >>> multiple times, parse records multiple times, etc >>> or having to have a synchronization between the different jobs and= the >>> feed source within asterixdb. IMO, this is far more complicated tha= n >> having >>> multiple transactions within a single job and the cost far outweigh= the >>> benefits. >>>=20 >>> P.S, >>> We are also using this for bucket connections in Couchbase Analytic= s. >>>=20 >>>> On Nov 16, 2017, at 2:57 PM, Till Westmann wrot= e: >>>>=20 >>>> If there are a number of issue with supporting multiple transactio= n ids >>>> and no clear benefits/use-cases, I=E2=80=99d vote for simplification :) >>>> Also, code that=E2=80=99s not being used has a tendency to "rot" and so = I think >>>> that it=E2=80=99s usefulness might be limited by the time we=E2=80=99d find a = use for >>>> this functionality. >>>>=20 >>>> My 2c, >>>> Till >>>>=20 >>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote: >>>>=20 >>>>> I'm separating the connections into different jobs in some of my >>>>> experiments... but that was intended to be used for the experimen= tal >>>>> settings (i.e., not for master now)... >>>>>=20 >>>>> I think the interesting question here is whether we want to allow= one >>>>> Hyracks job to carry multiple transactions. I personally think th= at >>> should >>>>> be allowed as the transaction and job are two separate concepts, = but I >>>>> couldn't find such use cases other than the feeds. Does anyone ha= ve a >>> good >>>>> example on this? >>>>>=20 >>>>> Another question is, if we do allow multiple transactions in a si= ngle >>>>> Hyracks job, how do we enable commit runtime to obtain the correc= t TXN >>> id >>>>> without having that embedded as part of the job specification. >>>>>=20 >>>>> Best, >>>>> Xikui >>>>>=20 >>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi < >> bamousaa@gmail.com> >>>>> wrote: >>>>>=20 >>>>>> I am curious as to how feed will work without this? >>>>>>=20 >>>>>> ~Abdullah. >>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs >> wrote: >>>>>>>=20 >>>>>>> Hi all, >>>>>>> We currently have MultiTransactionJobletEventListenerFactory, w= hich >>>>>> allows >>>>>>> for one Hyracks job to run multiple Asterix transactions togeth= er. >>>>>>>=20 >>>>>>> This class is only used by feeds, and feeds are in process of >>> changing to >>>>>>> no longer need this feature. As part of the work in pre-deployi= ng >> job >>>>>>> specifications to be used by multiple hyracks jobs, I've been >> working >>> on >>>>>>> removing the transaction id from the job specifications, as we = use a >>> new >>>>>>> transaction for each invocation of a deployed job. >>>>>>>=20 >>>>>>> There is currently no clear way to remove the transaction id fr= om >> the >>> job >>>>>>> spec and keep the option for MultiTransactionJobletEventLis >>> tenerFactory. >>>>>>>=20 >>>>>>> The question for the group is, do we see a need to maintain thi= s >> class >>>>>> that >>>>>>> will no longer be used by any current code? Or, an other words,= is >>> there >>>>>> a >>>>>>> strong possibility that in the future we will want multiple >>> transactions >>>>>> to >>>>>>> share a single Hyracks job, meaning that it is worth figuring o= ut >> how >>> to >>>>>>> maintain this class? >>>>>>>=20 >>>>>>> Steven >>>>>>=20 >>>>>>=20 >>>=20 >>>=20 >>=20 =20 =20