flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Buntu Dev <buntu...@gmail.com>
Subject De-duping events during ingestion
Date Thu, 16 Apr 2015 23:46:58 GMT
Are there any known strategies to handle duplicate events during ingestion?
I use Flume to ingest apache logs to parse the request using Morphlines and
there are some duplicate requests with certain query params differing. I
would like to handle these once I parse and split the query params into
tokens in Morphlines. How does one lookup previous events in the stream
(say in the 5min window) and de-dupe before writing to the sink?


View raw message