flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rinat <r.shari...@cleverdata.ru>
Subject StreamingFileSink with hdfs less than 2.7
Date Mon, 17 Jun 2019 09:29:42 GMT
Hi mates, I decided to enable persist the state of our flink jobs, that write data into hdfs,
but got some troubles with that.

I’m trying to use StreamingFileSink with cloudera hadoop, which version is 2.6.5,  and it
doesn’t contain truncate method.

So, job fails immediately when it’s trying to start, when trying to initialize HadoopRecoverableWriter.
Because it only works with hadoop fs, greater or equals than 2.7

Do you have any plans to adopt recovery for hadoop file systems, that doesn’t contain truncate
method, or how I can workaround such limitation ?

If workaround does not exist, than the following behaviour will be good enough:

get a path to the file, that should be restored
get a valid-length from the state
create a temporary directory and write stream from the restoring file into tmp until the valid-length
is not reached
replace the restoring file with the file from tmp catalog
move file to the final state

what do you think about it ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

email: r.sharipov@cleverdata.ru <mailto:a.totmakov@cleverdata.ru>
mobile: +7 (925) 416-37-26

make your data clever

View raw message