Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EAC26200C31 for ; Wed, 8 Mar 2017 19:22:56 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E96D1160B83; Wed, 8 Mar 2017 18:22:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3C8C0160B73 for ; Wed, 8 Mar 2017 19:22:56 +0100 (CET) Received: (qmail 81395 invoked by uid 500); 8 Mar 2017 18:22:55 -0000 Mailing-List: contact dev-help@apex.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.apache.org Delivered-To: mailing list dev@apex.apache.org Received: (qmail 81377 invoked by uid 99); 8 Mar 2017 18:22:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Mar 2017 18:22:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id ABAFF1A0491 for ; Wed, 8 Mar 2017 18:22:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.279 X-Spam-Level: * X-Spam-Status: No, score=1.279 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=datatorrent-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id e7_8A2DSJd3S for ; Wed, 8 Mar 2017 18:22:53 +0000 (UTC) Received: from mail-lf0-f46.google.com (mail-lf0-f46.google.com [209.85.215.46]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E2E5760DBE for ; Wed, 8 Mar 2017 18:22:52 +0000 (UTC) Received: by mail-lf0-f46.google.com with SMTP id a6so18276278lfa.0 for ; Wed, 08 Mar 2017 10:22:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datatorrent-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=rGtipcacz/jnXMHncE1jZUvKIKUfi9Sp8OsCfdLKahU=; b=vZXxgXC+7oJa3po+RAa1/SufgJHgeaaIotZjlKNh2SXBTBhkAhdIRhj77ni8CBGbd2 XZa+u7aPCsyFL9FFm2wpriO6KtZSFG8A4DfUaGKjThXRrOKVl3mMxEy5it6xpgB1dDr4 inx/vNsMU+4img2iAGBA0ToCOeNyMopwMCo0+NqdI44QrHv131XLf2LU3t8qFsSmlo3J 6oVDj3M9WZBGwwee3XC7QIGoSC9IhHPS/TEeNXAKbcrtPx6ZzHDL9ISBk4jv45ETx+eS kLqYmuH4qLW6oKRSzAbUxIeBNO8jtfkADVdKb2GRkXdaIae+heDHnhoEI9ZsaQ4H40B1 5/Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=rGtipcacz/jnXMHncE1jZUvKIKUfi9Sp8OsCfdLKahU=; b=RQO0ODzf0RoBztPKXXf0gfkyOlMtayAFlss8FvMaJCAJ+qVCYTIG5Xng0BT43O9GRd x4tm8sWby4LCs6TY/XLaKRzB2NT6FewbhoyI5ajdHE0mzyT8qn3WBht/kk4i4Vb3XX5x gUfJH2KEQ3aDuXcLCzbKKb1iJltw3kGCNUFNLWzc8Ipjo/wozHN+Ls83moNvTMQT1zwp +dfKs2QsPc4Cozku/vKRrDTPMz73CoZoHAVPUlJU51d2AN10yKlF9TXeUhM62Ij/2fcq vT95E8NPRB63dBdVMrT7dVfhmk+BaUIinh95H29KSBHYPdNNEltkUB6NPTnlNusHeQm+ FDIA== X-Gm-Message-State: AMke39nmugrl89Zjymzh9XjYa++ApOjAAWe+WdFM3FpQgEQSsigmynsNRULHXYDRtMuYjW6vYc+WvgoaJ2M6V3pu X-Received: by 10.46.82.2 with SMTP id g2mr2705725ljb.95.1488995682127; Wed, 08 Mar 2017 09:54:42 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.21.65 with HTTP; Wed, 8 Mar 2017 09:54:41 -0800 (PST) In-Reply-To: References: From: Bright Chen Date: Wed, 8 Mar 2017 09:54:41 -0800 Message-ID: Subject: Re: Sort Accumulation To: dev@apex.apache.org Content-Type: multipart/alternative; boundary=001a113abd9cb4a767054a3bd224 archived-at: Wed, 08 Mar 2017 18:22:57 -0000 --001a113abd9cb4a767054a3bd224 Content-Type: text/plain; charset=UTF-8 Hi Ajay, I think sort at getOutput() probably will get this method stuck due to very high volume of computation. And as we still need to persistent the data, it will not very helpful to increase the performance of processing tuple. Probably we can bucket the data with range of value. Such as following: - process tuple in one window: sort data of current window in memory - end window: merge the sorted memory data into buckets. thanks Bright On Wed, Mar 8, 2017 at 8:51 AM, AJAY GUPTA wrote: > Hi Thomas, > > I looked at TopN. The accumulate() of TopN is an O(n*k). Using similar > approach for Sort will lead to an O(n^2) complexity. > Since we have to sort all elements, we can do it in a single sort call in > getOutput(). > > > On Wed, Mar 8, 2017 at 10:09 PM, Thomas Weise wrote: > > > Look at the existing topN accumulation. It should be a generalization, > > where you don't have a limit. > > > > > > On Wed, Mar 8, 2017 at 8:05 AM, AJAY GUPTA wrote: > > > > > Hi, > > > > > > I would like to propose the Sort Accumulation. The accumulation will be > > > responsible for sorting the input POJO stream. The accumulation will > > > require a comparator to compare and sort the input tuples. Another > > boolean > > > parameter "sortDesc" will be used to decide sorting order. > > > > > > Let me know your views. > > > > > > Thanks, > > > Ajay > > > > > > --001a113abd9cb4a767054a3bd224--