nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: Using map cache clients to detect already processed files
Date Sat, 15 Dec 2018 14:11:10 GMT
Mike,

There is a DetectDuplicate processor. It gives you the ability to provide an attribute to
use for identification (for example, using a SHA256 hash or looking at an identifier in the
data or a filename, etc). It uses a DistributedMapCacheClient to track this so it could be
backed by Redis or whatever other implementations we have available. Would that give you what
you need?

Thanks
-Mark

Sent from my iPhone

> On Dec 15, 2018, at 8:52 AM, Mike Thomsen <mikerthomsen@gmail.com> wrote:
> 
> We are getting a lot of independent submissions of data from various and
> sundry teams that work with our client, and our client may need a processor
> that roughly does this story:
> 
> "as a NiFi user, I would like to be able to detect whether a file has been
> seen before and processed based on feedback from a RDBMS/HBase/Elastic and
> then be able to choose whether to reprocess it or drop it."
> 
> Want to make sure that I'm not reinventing the wheel before writing such a
> processor.
> 
> Thanks,
> 
> Mike

Mime
View raw message