nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Telford <kevin.telf...@gmail.com>
Subject Re: Persisting/loading NIFI flows for Docker deployment
Date Mon, 13 Apr 2020 13:07:56 GMT
Hi all - thank you for all the thoughtful feedback.

Regarding my original question, I think the patterns Mike outlined would be good enough.
That said, we're not going to move forward using NiFi for the project, and I figured I'd take
a step back to explain where we were coming from, as some may find the perspective useful.
Or not :)


We have a project that needs some data transformation. Input is excel, output multiple CSVs
or POSTs of data to an API. On the surface, simple enough.

Our input Excel can and will change a lot, so we'll need rapid iterations, and testing.

The project architecture is container-based, currently consisting of a front end docker image,
a back end image, and database image. ETL is intended to be a fourth. It can be orchestrated
with Docker Compose, K8, or bare metal. The goal is to be turn key and low friction.

There were two reasons we didn't choose NiFi - the painful (read: long) Java deployment lifecycle
for custom processing, and system complexity, particularly around updating new flows.

Regarding the pain of Java, I've partied with Java since 1.4, so I get it. But these days,
if I have a data analyst/data engineer with lowish programming skills, I can't have them compiling
and moving around jars, nor do I want to invest in building out the build/deploy pipeline.
Platforms have really evolved (especially look at the cloud native tools), and code can be
written "in line" in the UI, and just deployed. A lot of this is due to dynamic languages
(e.g. Python), but it can still be done with Java with behind the scenes compilation. Juypter
Notebook, for it's many, many faults, is the way things are heading, and the kids love it.

I touched a lot on updating flows above, but in NiFi my choices seemed to be to replace the
Flow.xml.gz file, or use the NiFi Registry. My concern with the registry was that it was yet
another moving part, and even still I'd have to build in source control workflows. Here again,
newer platforms have all this baked in.


In closing, I think there is definitely still a place for NiFi, especially on the enterprise
side where stability, scale and management are paramount. But I did want to share this, as
these non-enterprise use cases I am describing will, over time become the enterprise use cases,
and the NiFi project would do well to evaluate their long term strategy.

Thanks again for all the responses.
Best,
Kevin

On 2020/04/08 14:27:54, Kevin Telford <kevin.telford@gmail.com> wrote: 
> Hi all – I have a two part question.
> 
> 
> 
> I’d like to run NiFi inside a container in order to deploy to various
> environments. As far as I can tell, the flow.xml.gz file is the main
> “source” if you will, for a NiFi data flow.
> 
> Q1) Is the flow.xml.gz file the “source” of a NiFi data flow, and if so, is
> it best practice to copy it to a new env in order to “deploy” a prebuilt
> flow? Or how best is this handled?
> 
> 
> 
> Given that Q1 is true, my challenge then becomes somewhat Docker-specific…
> 
> Situation:
> 
>    - In the Dockerfile we unzip the NiFi source (L62
>    <https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/Dockerfile#L62>)
>    and then create Docker volumes (L75
>    <https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/Dockerfile#L75>
>    specifically for the conf dir). Once the container starts all the normal
>    NiFi startup things happen, and /opt/nifi/nifi-current/conf/flow.xml.gz
>    created.
> 
> Complication:
> 
>    - In order to persist flow.xml.gz outside of the container, I would
>    normally mount the /opt/nifi/nifi-current/conf directory, however in this
>    case I cannot mount it on initialization because that will overwrite conf
>    config files with whatever directory I bind it to (Docker container
>    isolation ensures host -> container file precedence).
>    - I could mount to a running container, but this is less ideal due to
>    the various ways a container can be deployed.
>    - I could copy manually from the running container, but this is less
>    ideal as it’s on demand, and not always persisting latest.
> 
> Resolution:
> 
>    - I believe instead, we would ideally create a few flow config specific
>    env vars and use them to update our nifi.properties (via
>    https://github.com/apache/nifi/blob/master/nifi-docker/dockerhub/sh/start.sh),
>    i.e. NIFI_FLOW_CONFIG_FILE_LOCATION, NIFI_FLOW_CONFIG_ARCHIVE_ENABLED,
>    NIFI_FLOW_CONFIG_ARCHIVE_DIR and so on for all nifi.flow.configuration
>    props.
> 
> Q2) Would the above proposal be ideal? (add a few env vars to start.sh) –
> if so, happy to add a PR for the code and doc change. Or have others solved
> this a different way?
> 
> 
> 
> Best,
> 
> Kevin
> 

Mime
View raw message