nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Hirko <d...@b23.io>
Subject Re: In ListS3 processor, where does Nifi persists the state of objects?
Date Sun, 22 Jan 2017 20:45:04 GMT
We use the ListS3 processor quite a bit, and maintaining state for millions of objects in S3
is critical for us. If we lost track of state, it would cause us to have to re-download a
lot of S3 objects, which is costly.

We use the "local-provider" and the default directory "./state/local"

When I had to migrate to a new instance, we could not afford to lose state, so I copied the
entire and original "./state/local" directory from the old to the new instance. The ListS3
processor in the new instance was able to use the state from the old one successfully. I didn't
see any documentation on this, but I was able to get it to work.

I have not figured out how to manipulate the state intentionally. There are use cases where
we need to go back in time a few days to relist objects that were recent, and so adjusting
the state back to a particular date would be helpful in certain cases. This would allow us
to "re-list" objects based on date parameters. As a workaround, I've added date filters.

--

Dave Hirko | dave@b23.io<mailto:dave@b23.io> | 571.421.7729

On Sun, 2017-01-22 at 08:21 -0700, Toivo Adams wrote:

Hi,

As far I know ListS3 use NiFi built in StateManager which in turn use
StateProvider's.
NiFi may have different StateProvider implementations.
Currently NiFi have 2 providers, ZooKeeper based and write-ahead log file
based.
ZooKeeper is used when NiFi cluster is configured and other is used for
local single node NiFi.
As I understand NiFi will choose automatically ZooKeeper for cluster and
local for single NiFi instance.

You can Replay FlowFile.
Open Data Provenance, choose Provenance Event, open CONTENT tab and click
REPLAY.
Also many NiFi processors have Failure relationship which is used to route
failed FlowFile’s to some other path. So you can automate how to handle
failed FlowFiles.

Data Provenance is simplest way to see successfully processed files.
But you can create custom Reporting Task to collect Provenance Events and do
what ever you need.

Regards
Toivo



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/In-ListS3-processor-where-does-Nifi-persists-the-state-of-objects-tp14489p14490.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message