nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil H <gippyp...@gmail.com>
Subject Re: Ideal hardware for NiFi
Date Tue, 11 Sep 2018 21:10:05 GMT
Fantastic, thanks Mark

On Wed, 12 Sep 2018 at 06:15, Mark Payne <markap14@hotmail.com> wrote:

> Phil,
>
> For the content repository, you can configure the directory by changing
> the value of
> the "nifi.content.repository.directory.default" property in
> nifi.properties. The suffix here,
> "default" is the name of this "container". You can have multiple
> containers by adding extra
> properties. So, for example, you could set:
>
> nifi.content.repository.directory.content1=/nifi/repos/content-1
> nifi.content.repository.directory.content2=/nifi/repos/content-2
> nifi.content.repository.directory.content3=/nifi/repos/content-3
> nifi.content.repository.directory.content4=/nifi/repos/content-4
>
> Similarly, the Provenance Repo property is named
> "nifi.provenance.repository.directory.default"
> and can have any number of "containers":
>
> nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
> nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
> nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
> nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4
>
> When NiFi writes to these, it does a Round Robin so that if you're writing
> to 4 Flow Files'
> content simultaneously with different threads, you're able to get the full
> throughput of each
> disk. (So if you have 4 disks for your content repo, each capable of
> writing 100 MB/sec, then
> your effective write rate to the content repo is 400 MB/sec). Similar with
> Provenance Repository.
>
> Doing this also will allow you to hold a larger 'archive' of content and
> provenance data, because
> it will span the archive across all of the listed directories, as well.
>
> Thanks
> -Mark
>
>
>
> > On Sep 11, 2018, at 3:35 PM, Phil H <gippyphil@gmail.com> wrote:
> >
> > Thanks Mark, this is great advice.
> >
> > Disk access is certainly an issue with the current set up. I will
> certainly
> > shoot for NVMe disks in the build. How does NiFi get configured to span
> > it's repositories across multiple physical disks?
> >
> > Thanks,
> > Phil
> >
> > On Wed, 12 Sep 2018 at 01:32, Mark Payne <markap14@hotmail.com> wrote:
> >
> >> Phil,
> >>
> >> As Sivaprasanna mentioned, your bottleneck will certainly depend on your
> >> flow.
> >> There's nothing inherent about NiFi or the JVM, AFAIK that would limit
> >> you. I've
> >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on bare
> >> metal
> >> on servers containing 96+ cores. Most often, I see people with a lot of
> >> CPU cores
> >> but insufficient disk, so if you're running several cores ensure that
> >> you're using
> >> SSD's / NVMe's or enough spinning disks to accommodate the flow. NiFi
> does
> >> a good
> >> job of spanning the content and FlowFile repositories across multiple
> >> disks to take
> >> full advantage of the hardware, and scales the CPU vertically by way of
> >> multiple
> >> Processors and multiple concurrent tasks (threads) on a given Processor.
> >>
> >> It really comes down to what you're doing in your flow, though. If
> you've
> >> got 96 cores and
> >> you're trying to perform 5 dozen transformations against a large number
> of
> >> FlowFiles
> >> but have only a single spinning disk, then those 96 cores will likely go
> >> to waste, because
> >> your disk will bottleneck you.
> >>
> >> Likewise, if you have 10 SSD's and only 8 cores you're likely going to
> >> waste a lot of
> >> disk because you won't have the CPU needed to reach the disks' full
> >> potential.
> >> So you'll need to strike the correct balance for your use case.Since you
> >> have the
> >> flow running right now, I would recommend looking at things like `top`
> and
> >> `iostat` in order
> >> to understand if you're reaching your limit on CPU, disk, etc.
> >>
> >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram for
> >> the heap. However,
> >> more RAM means that your operating system can make better use of disk
> >> caching, which
> >> can certainly speed things up, especially if you're reading the content
> >> several times for
> >> each FlowFile.
> >>
> >> Does this help at all?
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>> On Sep 10, 2018, at 6:05 AM, Phil H <gippyphil@gmail.com> wrote:
> >>>
> >>> Thanks for that. Sorry I should have been more specific - we have a
> flow
> >>> running already on non-dedicated hardware. Looking to identify any
> >>> limitations in NiFi/JVM that would limit how much parallelism it can
> take
> >>> advantage of
> >>>
> >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <sivaprasanna246@gmail.com>
> >>> wrote:
> >>>
> >>>> Phil,
> >>>>
> >>>> The hardware requirements are driven by the nature of the dataflow you
> >> are
> >>>> developing. If you're looking to play around with NiFi and gain some
> >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
> >>>> laptops/computer would do the job. In my case, where I'm having 100s
> of
> >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and
> >> 4(8)
> >>>> cores. I went with SSDs of smaller size because my flows are involved
> in
> >>>> writing to object stores like Google Cloud Storage, Azure Blob and
> >> Amazon
> >>>> S3 and NoSQL DBs. Hope this helps.
> >>>>
> >>>> -
> >>>> Sivaprasanna
> >>>>
> >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <gippyphil@gmail.com> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I've been asked to spec some hardware for a NiFi installation. Does
> >>>> anyone
> >>>>> have any advice? My gut feel is lots of processor cores and RAM,
with
> >>>> less
> >>>>> emphasis on storage (small fast disks). Are there any limitations
on
> >> how
> >>>>> many cores the JRE/NiFi can actually make use of, or any other
> >>>>> considerations like that I should be aware of?
> >>>>>
> >>>>> Most likely will be pairs of servers in a cluster, but again any
> advice
> >>>> to
> >>>>> the contrary would be appreciated.
> >>>>>
> >>>>> Cheers,
> >>>>> Phil
> >>>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message