flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aj <ajainje...@gmail.com>
Subject Data Quality Library in Flink
Date Sat, 06 Jun 2020 06:23:23 GMT
Hello All,

I  want to do some data quality analysis on stream data example.

1. Fill rate in a particular column
2. How many events are going to error queue due to favor schema
validation failed?
3. Different statistics measure of a column.
3. Alert if a particular threshold is breached (like if fill rate is less
than 90% for a column)

Is there any library that exists on top of Flink for data quality. As I am
looking there is a library on top of the spark
https://github.com/awslabs/deequ

This proved all that I am looking for.

-- 
Thanks & Regards,
Anuj Jain



<http://www.cse.iitm.ac.in/%7Eanujjain/>

Mime
View raw message