flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] kl0u commented on issue #6824: [FLINK-9592][flink-connector-filesystem] added ability to hook file state changing
Date Fri, 12 Oct 2018 08:15:38 GMT
kl0u commented on issue #6824: [FLINK-9592][flink-connector-filesystem] added ability to hook
file state changing
URL: https://github.com/apache/flink/pull/6824#issuecomment-429244062
 
 
   Hi @kent2171 ,
   
   I had a look at the PR. I also wrote the same comment at the associated JIRA but I also
include it here.
   
   In general, as I said earlier, I like the idea of having Callbacks to notify when a file
changes state.
   As far as the design/implementation of the current PR is concerned, the following are my
comments:
   
   1) The `FileStateChangedCallback` seems to be pretty limiting, and probably designed with
a specific usecase in mind. It assumes that the user would like to do sth with the underlying
file system when the file changes state (e.g. write a special file). But other usecases may
need to do a REST call, or update a DB, or in general communicate with another system.  
   
   Given the above, I would suggest that the function should have an `open()` and a `close()`
method which are called once and are responsible for allocating and freeing resources. The
`open()` should potentially take the `flinkConfig` as argument and initialize any long-living
resources, e.g. connections to databases, a connection to the filesystem, etc, and the close
should be responsible for freeing them. This will allow the sink to accommodate a broader
variety of usecases. Now for the methods themselves, I do not yet have a definite answer on
what should be included as argument, but I would also include a `Context` as an argument.
This will allow for future-proofing the method, as we will be able to add stuff in the `Context`
if we want to expose more stuff in the future, rather than deprecating the already existing
API and creating a new one.
   
   2) IMPORTANT CONSIDERATIONS to keep in mind: all this is a "best-effort" reporting of state
changes, as, for example, if a failure happens after transitioning a file to its "final" state,
but before calling the hook, then you will never get the notification. This behavior is aligned
with Flink's metric system, where metrics are not checkpointed. In our case though, the scenario
described above is more tricky to accommodate as we are talking about integration with external
systems.
   
   Let me know what you think about the above!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message