Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6E7B910962 for ; Mon, 28 Dec 2015 10:32:54 +0000 (UTC) Received: (qmail 9342 invoked by uid 500); 28 Dec 2015 10:32:54 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 9275 invoked by uid 500); 28 Dec 2015 10:32:54 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 9263 invoked by uid 99); 28 Dec 2015 10:32:53 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Dec 2015 10:32:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 624391A1319 for ; Mon, 28 Dec 2015 10:32:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.999 X-Spam-Level: ** X-Spam-Status: No, score=2.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=datatorrent-com.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id M7Vl2w4q2_65 for ; Mon, 28 Dec 2015 10:32:47 +0000 (UTC) Received: from mail-qg0-f54.google.com (mail-qg0-f54.google.com [209.85.192.54]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id E9D2E203A3 for ; Mon, 28 Dec 2015 10:32:46 +0000 (UTC) Received: by mail-qg0-f54.google.com with SMTP id b35so14312489qge.0 for ; Mon, 28 Dec 2015 02:32:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datatorrent-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Mv1jtK6baNG2VblKi8qYWzHttKPTTyO39+HXw1/qGFc=; b=emT2x1YgVd+Go3uShmv4WhIzqLIVB7n1zR1ctE2JysBt9k27jFEKwV2+SIyNxZmeuy Ryu27kt+Datwl/YsO0SZ13CxEgZNjIz61WrqH12gMPqiZixbjT/oRWwXbWT8bg1IsVS3 Ek3lJuajXVZkHx94hK9cLyoAB35MOnHWwCrSBM8xkPtcKXiPw4dq0lv5GBQRUd0aFpqP jvicrpcRmhn4o7jeoM3F61VR1S/Rmgg3agxinjn3q93xybnvMp5EKZK3xz6ujc6u6/Gr DAq2X59YtWZBwfZ5Jmfck4vzHD0ZJ0EPnn3Pp6pE+2JO8n+JAxSrPU0maWFYVyXo+6dR XEQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=Mv1jtK6baNG2VblKi8qYWzHttKPTTyO39+HXw1/qGFc=; b=frKyI5UAePOF1/Leg52xT8P9DiLdrRDHXDuT6+Xn5t4ivq4fAIow+do+y8/DdA5/SO 6bT4koePdpuD2yEnuOsLxE2QP2h3+AQD7Y6q3VFQ1KdBIPNypJ0UJQkjlAJg/04qLmjb KS/GMWV/gD8ZjM7EMugOM4jVkARXvpIs9YOMNbY/pnJbSoChIOwYg+Z1qHwXLk+lQRyG LSdU2SXCE1tta86R/Y7XcZu4NmaFLyZhglNAnZHgn8OFNQ2p6FZ8vL9j757svhVnUNMd 2aJ7Lfud2IlIezBgbjWMbgyIX0r3z4f7fa+bqdXY9uxG21/gSYM+rl+3MBreijGkEltl ZJ8g== X-Gm-Message-State: ALoCoQlumrC3jcJ1rDC8ne4E4lRmILjFEKYyvj8TRMme8ipDY/YTjUJ290b76kPmPIuLESCdo/udMHmWsEq+N4YcXJs0+V27ewIpOOmqZrbKJTDVvicfwXo= MIME-Version: 1.0 X-Received: by 10.140.94.168 with SMTP id g37mr70031006qge.78.1451298759813; Mon, 28 Dec 2015 02:32:39 -0800 (PST) Received: by 10.140.82.42 with HTTP; Mon, 28 Dec 2015 02:32:39 -0800 (PST) In-Reply-To: References: Date: Mon, 28 Dec 2015 16:02:39 +0530 Message-ID: Subject: Re: Writing batches to database using Transactionable Store Output operator From: Priyanka Gugale To: dev@apex.incubator.apache.org Content-Type: multipart/alternative; boundary=001a113a9b340ac47b0527f2d35c --001a113a9b340ac47b0527f2d35c Content-Type: text/plain; charset=UTF-8 Hi, Sorry if I was not clear, but I am trying to propose the MAX_SIZE per window which the operator could process. The size could be less than the MAX_SIZE, no restriction about that. -Priyanka On Mon, Dec 28, 2015 at 3:22 PM, Chandni Singh wrote: > How do you propose to to restrict the no. of tuples processed in an > application window < batch size. > > I don't see a way to enforce that batch size can never be less tuples > processed in an application window. > > On Mon, Dec 28, 2015 at 1:25 AM, Priyanka Gugale > wrote: > > > Hi Chandni, > > > > How about restricting tuples which can be processed per window. If > someone > > wants to process small and frequent batches, he can set batch size to > some > > small value and also reduce the window size. This would build some back > > pressure of course. But that could be acceptable if one really want to > > restrict batch size. > > The though was triggered while working on Cassandra output operator. > > Cassandra creates problem in processing batches of size greater than some > > value (don't recall exact number right now). Other databases may want to > > restrict the batch size for similar or other reasons. > > > > -Priyanka > > > > On Mon, Dec 28, 2015 at 2:46 PM, Chandni Singh > > wrote: > > > > > Priyanka, > > > > > > AbstractBatchTransactionableStore assumes all tuples in one application > > as > > > a batch because it needs to store the tuples in the store exactly-once. > > > > > > If there is more than one batch in an application window, then to store > > the > > > tuples exactly once the window Id needs to be written with every tuple > as > > > well which is not that efficient. Therefore we take advantage of the > > > transaction support by saving just the window id once (not with every > > > tuple) but this necessitates all the tuples to be considered as a > batch. > > > > > > Every operator in a DAG can have its own application window size. So to > > > reduce the size per batch, the application window attribute needs to be > > > modified. > > > > > > Chandni > > > > > > On Mon, Dec 28, 2015 at 1:01 AM, Chinmay Kolhatkar < > > > chinmay@datatorrent.com> > > > wrote: > > > > > > > +1 for this. > > > > > > > > ~ Chinmay. > > > > > > > > On Mon, Dec 28, 2015 at 2:27 PM, Priyanka Gugale > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > In Malhar we have an > > > > > operator AbstractBatchTransactionableStoreOutputOperator which > > creates > > > > > batches based on tuples received in a window. At the end of the > > window > > > > > these batches are sent to database for processing. > > > > > There is no way to configure MAX_SIZE on these batches. Based on > > input > > > > rate > > > > > the batch sizes can grow very high, and we might want to restrict > > batch > > > > > size. > > > > > > > > > > Any operator can extend and do batch management on their own, but I > > see > > > > it > > > > > as generic requirement and IMO we should change base class i.e. > > > > > AbstractBatchTransactionableStoreOutputOperator class to accept > > > MAX_SIZE > > > > > for batch from outside. > > > > > > > > > > Any opinion on this? > > > > > > > > > > -Priyanka > > > > > > > > > > > > > > > --001a113a9b340ac47b0527f2d35c--