Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9A93F200B62 for ; Fri, 12 Aug 2016 11:14:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 993AA160AB6; Fri, 12 Aug 2016 09:14:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DF445160AA6 for ; Fri, 12 Aug 2016 11:14:45 +0200 (CEST) Received: (qmail 93823 invoked by uid 500); 12 Aug 2016 08:59:00 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 62811 invoked by uid 99); 12 Aug 2016 08:54:59 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Aug 2016 08:54:59 +0000 Received: from mail-io0-f182.google.com (mail-io0-f182.google.com [209.85.223.182]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 95CD81A00C5 for ; Fri, 12 Aug 2016 08:54:59 +0000 (UTC) Received: by mail-io0-f182.google.com with SMTP id b62so19141429iod.3 for ; Fri, 12 Aug 2016 01:54:59 -0700 (PDT) X-Gm-Message-State: AEkoouvwvtL7jD8KJeUd6eZzBShEuMbobFaRf1YXyK/yhD6sUAsex7fxghNNKMfstBtdk0x3VSbqQqz6+4FUtw== X-Received: by 10.107.18.101 with SMTP id a98mr19237183ioj.116.1470992098793; Fri, 12 Aug 2016 01:54:58 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.171.7 with HTTP; Fri, 12 Aug 2016 01:54:58 -0700 (PDT) In-Reply-To: References: From: Stephan Ewen Date: Fri, 12 Aug 2016 10:54:58 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Conceptual difference Windows and DataSet To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=001a113f5fac8424610539dc09bb archived-at: Fri, 12 Aug 2016 09:14:46 -0000 --001a113f5fac8424610539dc09bb Content-Type: text/plain; charset=UTF-8 Hi Kevin! The windows in Flink's DataStream API are organized by key. The reason is that the windows are very flexible, and each key can form different windows than the other (think sessions per user - each session starts and stops differently). There has been discussion about introducing something like "aligned windows". These types of windows would be the same across all keys and could therefor be globally organized. One could even think that these offer DataSet-like features. That is a bit into the future, still. Greeting, Stephan On Sat, Aug 6, 2016 at 11:58 PM, Theodore Vasiloudis < theodoros.vasiloudis@gmail.com> wrote: > Hello Kevin, > > I'm not very familiar with the stream API, but I think you can achieve what > you want by mapping over your elements to turn the > strings into one-item lists, so that you get a key-value that is (K: > String, V: (List[String], Int)) and then apply the window reduce function, > which produces a data stream out of > a windowed stream, you combine your lists there and sum the value. Again, > it's not a great way to use reduce, since you are growing the list with > each reduction. > > Regards, > Theodore > > On Thu, Aug 4, 2016 at 1:36 AM, Kevin Jacobs wrote: > > > Hi, > > > > I have the following use case: > > > > 1. Group by a specific field. > > > > 2. Get a list of all messages belonging to the group. > > > > 3. Count the number of records in the group. > > > > With the use of DataSets, it is fairly easy to do this (see > > http://stackoverflow.com/questions/38745446/apache-flink- > > sum-and-keep-grouped/38747685#38747685): > > > > |fromElements(("a-b", "data1", 1), ("a-c", "data2", 1), ("a-b", "data3", > > 1)). groupBy(0). reduceGroup { (it: Iterator[(String, String, Int)], out: > > Collector[(String, List[String], Int)]) => { val group = it.toList if > > (group.length > 0) out.collect((group(0)._1, group.map(_._2), > > group.map(_._3).sum)) } | > > > > So, now I am moving to DataStreams (since the input is really a > > DataStream). From my perspective, a Window should provide the same > > functionality as a DataSet. This would easify the process a lot: > > > > 1. Window the elements. > > > > 2. Apply the same operations as before. > > > > Is there a way in Flink to do so? Otherwise, I would like to think of a > > solution to this problem. > > > > Regards, > > Kevin > > > --001a113f5fac8424610539dc09bb--