flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Santiago <ch...@ninjametrics.com>
Subject Re: Multiple windows with large number of partitions
Date Mon, 02 May 2016 19:22:32 GMT
Hi Aljoscha,

Yes, there is still a high partition/window count since I have to keyby the
userid so that I get unique users.  I believe what I see happening is that
the second window with the timeWindowAll is not getting all the results or
the results from the previous window are changing when the second window is
running.  I can see the date/unique user count increase and decrease as it
is running for a particular day.

I can share the eclipse project and the sample data file I am working off
of with you if that would be helpful.

Thanks,
Chris

On Mon, May 2, 2016 at 12:55 AM, Aljoscha Krettek [via Apache Flink User
Mailing List archive.] <ml-node+s2336050n6601h71@n4.nabble.com> wrote:

> Hi,
> what do you mean by "still experiencing the same issues"? Is the key count
> still very hight, i.e. 500k windows?
>
> For the watermark generation, specifying a lag of 2 days is very
> conservative. If the watermark is this conservative I guess there will
> never arrive elements that are behind the watermark, thus you wouldn't need
> the late-element handling in your triggers. The late-element handling in
> Triggers is only required to compensate for the fact that the watermark can
> be a heuristic and not always correct.
>
> Cheers,
> Aljoscha
>
> On Thu, 28 Apr 2016 at 21:24 Christopher Santiago <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=6601&i=0>> wrote:
>
>> Hi Aljoscha,
>>
>>
>> Aljoscha Krettek wrote
>> >>is there are reason for keying on both the "date only" field and the
>> "userid". I think you should be fine by just specifying that you want
>> 1-day
>> windows on your timestamps.
>>
>> My mistake, this was from earlier tests that I had performed.  I removed
>> it
>> and went to keyBy(2) and I am still experiencing the same issues.
>>
>>
>> Aljoscha Krettek wrote
>> >>Also, do you have a timestamp extractor in place that takes the
>> timestamp
>> from your data and sets it as the internal timestamp field.
>>
>> Yes there is, it is from the BoundedOutOfOrdernessGenerator example:
>>
>>     public static class BoundedOutOfOrdernessGenerator implements
>> AssignerWithPeriodicWatermarks<Tuple3&lt;DateTime, String, String>>
{
>>         private static final long serialVersionUID = 1L;
>>         private final long maxOutOfOrderness =
>> Time.days(2).toMilliseconds();
>>         private long currentMaxTimestamp;
>>
>>         @Override
>>         public long extractTimestamp(Tuple3<DateTime, String, String>
>> element, long previousElementTimestamp) {
>>             long timestamp = element.f0.getMillis();
>>             currentMaxTimestamp = Math.max(timestamp,
>> currentMaxTimestamp);
>>             return timestamp;
>>         }
>>
>>         @Override
>>         public Watermark getCurrentWatermark() {
>>             return new Watermark(currentMaxTimestamp - maxOutOfOrderness);
>>         }
>>     }
>>
>> Thanks,
>> Chris
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Multiple-windows-with-large-number-of-partitions-tp6521p6562.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Multiple-windows-with-large-number-of-partitions-tp6521p6601.html
> To unsubscribe from Multiple windows with large number of partitions, click
> here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=6521&code=Y2hyaXNAbmluamFtZXRyaWNzLmNvbXw2NTIxfC01MTI2ODMwNjU=>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Multiple-windows-with-large-number-of-partitions-tp6521p6626.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Mime
View raw message