flink-user-zh mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dixingxing85 <dixingxin...@163.com>
Subject Re: Flink streaming sql是否支持两层group by聚合
Date Sat, 18 Apr 2020 03:37:42 GMT
多谢benchao,
我这个作业的结果预期结果是每天只有一个结果,这个结果应该是越来越大的,比如:
20200417,86
20200417,90
20200417,130
20200417,131

而不应该是忽大忽小的,数字由大变小,这样的结果需求方肯定不能接受的:
20200417,90
20200417,86
20200417,130
20200417,86
20200417,131

我的疑问是内层的group by产生的retract流,会影响sink吗,我是在sink端打的日志。
如果flink支持这种两层group by的话,那这种结果变小的情况应该算是bug吧?

Sent from my iPhone

> On Apr 18, 2020, at 10:08, Benchao Li <libenchao@gmail.com> wrote:
> 
> 
> Hi,
> 
> 这个是支持的哈。
> 你看到的现象是因为group by会产生retract结果,也就是会先发送-[old],再发送+[new].
> 如果是两层的话,就成了:
> 第一层-[old], 第二层-[cur], +[old]
> 第一层+[new], 第二层[-old], +[new]
> 
> dixingxing85@163.com <dixingxing85@163.com> 于2020年4月18日周六 上午2:11写道:
>> 
>> Hi all:
>> 
>> 我们有个streaming sql得到的结果不正确,现象是sink得到的数据一会大一会小,我们想确认下,这是否是个bug,
或者flink还不支持这种sql。
>> 具体场景是:先group by A, B两个维度计算UV,然后再group by A 把维度B的UV
sum起来,对应的SQL如下:(A -> dt,  B -> pvareaid)
>> SELECT dt, SUM(a.uv) AS uv
>> FROM (
>>    SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv
>>    FROM streaming_log_event
>>    WHERE action IN ('action1')
>>       AND pvareaid NOT IN ('pv1', 'pv2')
>>       AND pvareaid IS NOT NULL
>>    GROUP BY dt, pvareaid
>> ) a
>> GROUP BY dt;
>> sink接收到的数据对应日志为:
>> 2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1)
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417)
>> 2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1)
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,130,20200417)
>> 2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1)
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,130,20200417)
>> 2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1)
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,86,20200417)
>> 2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1)
(GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417)
>> 2020-04-17 22:28:39,328    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1)
(GeneralRedisSinkFunction.invoke:169) - receive data(true,0,131,20200417)
>> 
>> 我们使用的是1.7.2, 测试作业的并行度为1。
>> 这是对应的 issue: https://issues.apache.org/jira/browse/FLINK-17228
>> 
>> 
>> dixingxing85@163.com
> 
> 
> -- 
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenchao@gmail.com; libenchao@pku.edu.cn

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message