flink-user-zh mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dixingxing85@163.com" <dixingxin...@163.com>
Subject Flink streaming sql是否支持两层group by聚合
Date Fri, 17 Apr 2020 18:10:56 GMT

Hi all:

我们有个streaming sql得到的结果不正确,现象是sink得到的数据一会大一会小,我们想确认下,这是否是个bug,
或者flink还不支持这种sql。
具体场景是:先group by A, B两个维度计算UV,然后再group by A 把维度B的UV
sum起来,对应的SQL如下:(A -> dt,  B -> pvareaid)
SELECT dt, SUM(a.uv) AS uv
FROM (
   SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv
   FROM streaming_log_event
   WHERE action IN ('action1')
      AND pvareaid NOT IN ('pv1', 'pv2')
      AND pvareaid IS NOT NULL
   GROUP BY dt, pvareaid
) a
GROUP BY dt;
sink接收到的数据对应日志为:
2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169)
- receive data(false,0,86,20200417)
2020-04-17 22:28:38,727    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169)
- receive data(true,0,130,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169)
- receive data(false,0,130,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169)
- receive data(true,0,86,20200417)
2020-04-17 22:28:39,327    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169)
- receive data(false,0,86,20200417)
2020-04-17 22:28:39,328    INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169)
- receive data(true,0,131,20200417)

我们使用的是1.7.2, 测试作业的并行度为1。
这是对应的 issue: https://issues.apache.org/jira/browse/FLINK-17228




dixingxing85@163.com
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message