flink-user-zh mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 宇张 <zhan...@akulaku.com>
Subject flink1.9 Blink sql 丢失主键+去重和时态表联合使用吞吐量低
Date Mon, 11 May 2020 03:14:28 GMT
hi、
我这面使用flink1.9的Blink sql完成数据转换操作,但遇到如下问题:
1、使用row_number函数丢失主键
2、row_number函数和时态表关联联合使用程序吞吐量严重降低,对应sql如下:
// 理论上这里面是不需要 distinct的,但sql中的主键blink提取不出来导致校验不通过,所以加了一个
SELECT distinct t1.id as order_id,...,DATE_FORMAT(t1.proctime,'yyyy-MM-dd
HH:mm:ss') as etl_time FROM (select id,...,proctime from (select
data.index0.id,...,proctime,ROW_NUMBER() OVER (PARTITION BY data.index0.id
ORDER BY es desc) AS rowNum from installmentdb_t_line_item)tmp where
rowNum<=1) t1 left join SNAP_T_OPEN_PAY_ORDER FOR SYSTEM_TIME AS OF
t1.proctime t2 on t2.LI_ID= t1.id left join SNAP_T_SALES_ORDER FOR
SYSTEM_TIME AS OF t1.proctime t4 ON t1.so_id =t4.ID
上面的sql吞吐率很低,每秒就处理几条数据,而下面两种情况分开跑,吞吐量都能达标,仅时态表关联能到到几千条,仅rownumber能达到几万条,但不知道为什么他们俩联合后就只有几条了

SELECT distinct t1.id as order_id,...,DATE_FORMAT(t1.proctime,'yyyy-MM-dd
HH:mm:ss') as etl_time FROM (select id,...,proctime from (select
data.index0.id,...,proctime from installmentdb_t_line_item)tmp ) t1 left
join SNAP_T_OPEN_PAY_ORDER FOR SYSTEM_TIME AS OF t1.proctime t2 on
t2.LI_ID= t1.id left join SNAP_T_SALES_ORDER FOR SYSTEM_TIME AS OF
t1.proctime t4 ON t1.so_id =t4.ID

SELECT distinct t1.id as order_id,...,DATE_FORMAT(t1.proctime,'yyyy-MM-dd
HH:mm:ss') as etl_time FROM (select id,...,proctime from (select
data.index0.id,...,proctime,ROW_NUMBER() OVER (PARTITION BY data.index0.id
ORDER BY es desc) AS rowNum from installmentdb_t_line_item)tmp where
rowNum<=1) t1
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message