nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chowdhury, Rifat" <Rifat.Chowdh...@disney.com>
Subject Need Advice on Nifi Cluster Using Separate Storage for Provenance Repo, Flowfile Repo and Content Repo.
Date Wed, 30 Oct 2019 17:34:21 GMT
Hi,

My Name is Rifat. I am a Software Engineer at ESPN/Disney. I have been using Nifi for almost
one year now and We have a 10 Node Nifi Cluster setup in our production environment. As per
the best practices document:

https://community.cloudera.com/t5/Community-Articles/NiFi-Sizing-Guide-Deployment-Best-Practices/ta-p/246781

I would want to have 5 separate Repos for Content Repo, 5 for Provenance Repo and 1 for Flowfile
Repo per node. I need your expert advice on whether using EBS or EFS is the best approach
to achieve this goal. I already tried EFS and I saw some problems with Load Balancing since
I mounted an EFS Volume per partition(that's 11 EFS partitions per node). This is from the
Response I got from AWS after raising a support ticket with them.


Hello Rifat,

Thank you for contacting AWS Premium Support.

I am not familiar with Apache NiFi, as this is a third-party software not covered by our support
policy [1]. That been said, I had a look at its documentation [2] and some related links,
and it doesn't seem to me that these repositories are meant to be on shared storage accessible
by all nodes. If you look at the NiFi Architecture section, it's suggested that this data
is stored locally on each node, and the guidelines in the link you provided us with are aligned
with that principle. I did find some connection between NiFi and Hadoop Distributed File System
(HDFS), which has some fundamental differences to EFS, however it doesn't seem to have any
relation with these repositories.

While EFS itself provides strong durability and availability guarantees, the NFS protocol
is meant to provide weaker cache coherence among its clients as a trade-off for higher performance.
Characteristics such as Attribute Caching, Directory Entry Caching, Asynchronous writes, and
the differences in how file timestamps are maintained lead to discrepancies in how each node
sees data, potentially impacting clustered applications expecting strong consistency. You'll
find a good write-up on that in the Linux NFS documentation [3], section "Data and Metadata
Coherence".

To see if one of these characteristics are causing the issue, I advise you to append 'sync'
and 'noac' as mount options for all EFS resources in all nodes; the first one will cause all
write I/O to become synchronous, and the second one will disable Attribute and Directory Entry
caching. If that helps resolve the issues you are seeing, we'll know that NiFi is expecting
strong cache coherence. However, you'll need to evaluate if the performance penalty of mounting
with these options is bearable. It may be possible that EBS or even Instance Store are better
options to host these repositories, provided that you understand the differences in performance
and durability between the two.

On a side note, you are missing a few of the recommended mount options for EFS. Although I
don't expect them to cause an immediate impact for the issue described in this support case,
it's a good idea to implement them to avoid other issues. Please check here [4] for details.

Regarding your question on how to enable communication between directories that are mounted
on a different EFS, this whole idea of inter-EFS communication does not apply. EFS is a file
system, and there's no exchange of data between separate EFS resources; the only "communication"
in that sense would be moving data from one EFS to another, which can be done within an instance
having both file systems mounted. I believe that at this stage, testing the solution with
the proposed mount options above is a good course of action to isolate the problem.

With regard to your comment on logging into these machines and reading the contents of /var/log/nifi,
please note that Support personnel is not allowed under any circumstances to access customer's
instances. At this stage, I believe that these logs are not required for this case.

To summarise, my first advice is that you seek advice from NiFi experts on whether using a
distributed file system such as EFS to host cluster node's repositories is a valid approach.
If using Cloudera Flow Management, you should be able to receive support from Cloudera, otherwise
the NiFi Community is an option [6]. The second advice is to test EFS mounted with 'sync'
and 'noac' to see if it helps resolve the issue; if the performance penalty is unbearable,
consider switching repositories to EBS or Instance Store volumes.

If you have questions on the above, please let me know.  


Please let me know the Best Approach to take to solve this problem.


Best Regards, Rifat

  

Mime
View raw message