flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Flink and Directory Monitors
Date Wed, 09 Mar 2016 09:04:20 GMT
Hi Philippe,

I am not aware of anybody using Directory Monitor with Flink. However, the
application you described sounds reasonable and I think it should be
possible to implement that with Flink.

You would need to implement a SourceFunction that forwards events from DM
to Flink or you push the DM events into Kafka and use Flink's Kakfa
SourceFunction. Using Kafka has the benefit that fault tolerance and
exactly-once behavior are much easier to achieve because Kafka buffers
events for some time and Flink's Kafka source can replay the events if
necessary. If you implement a direct DM source for Flink, you would need to
implement the buffering yourself to achieve exactly-once or at-least-once

You do not need HDFS to communicate between DM and Flink, events can be
directly consumed without going through a filesystem. However, Flink
requires a persistent state backend to backup checkpoints for failure
recovery. This is usually HDFS but that component is pluggable.

Cheers, Fabian

2016-03-07 15:53 GMT+01:00 <phiroc@free.fr>:

> Hello,
> has anyone ever used Flink with file/directory monitoring applications
> such as Directory Monitor (https://directorymonitor.com/)?
> Is it conceivable to process file-update events with Flink? For instance,
> let's says hundreds of users simultaneously modify files on a filesystem.
> Directory Monitor detects those modifications and send them as
> events/streams/or logs entries to Flink, which processes them to extract,
> say, the names of the files that have been modified the most, over a period
> of time, or the names of the biggest filesystem hogs (i.e., users who
> consume the most filesystem space).
> Would Hadoop be needed between Directory Monitor and Flink, to store
> historical, filesystem-change data?
> Many thanks.
> Philippe

View raw message