nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon DeVries (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (NIFI-4775) Create a FlowFile repo backed by RocksDB
Date Tue, 13 Aug 2019 13:50:00 GMT

    [ https://issues.apache.org/jira/browse/NIFI-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906179#comment-16906179
] 

Brandon DeVries edited comment on NIFI-4775 at 8/13/19 1:49 PM:
----------------------------------------------------------------

[~joewitt], the Hive and Atlas nars appear to increase the size of the build by 60 MB, 150
MB and 275 MB for functionality that is only relevant to people who wish to use those specific
components. The RocksDB depedency increases the size of the build by 12 MB... but in doing
so enables a FlowFile repository implementation that can provide guarantees against data loss
when running at scale (which the current implementation cannot). This isn't a niche interest
to NiFi users, but rather one that is probably near the top of most users' lists.

My understanding is that the current NiFi binary size is ~1.2 GB [1], and our limit is 1.5
GB [2]. I understand wanting to not waste the remaining 300 MB, but I argue these 12 MB are
worth it.

[1] [https://nifi.apache.org/download.html]
 [2] https://issues.apache.org/jira/browse/INFRA-15816


was (Author: devriesb):
The Hive and Atlas nars appear to increase the size of the build by 60 MB, 150 MB and 275
MB for functionality that is only relevant to people who wish to use those specific components.
The RocksDB depedency increases the size of the build by 12 MB... but in doing so enables
a FlowFile repository implementation that can provide guarantees against data loss when running
at scale (which the current implementation cannot). This isn't a niche interest to NiFi users,
but rather one that is probably near the top of most users' lists.

My understanding is that the current NiFi binary size is ~1.2 GB [1], and our limit is 1.5
GB [2]. I understand wanting to not waste the remaining 300 MB, but I argue these 12 MB are
worth it.


[1] https://nifi.apache.org/download.html
[2] https://issues.apache.org/jira/browse/INFRA-15816

> Create a FlowFile repo backed by RocksDB
> ----------------------------------------
>
>                 Key: NIFI-4775
>                 URL: https://issues.apache.org/jira/browse/NIFI-4775
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Brandon DeVries
>            Priority: Major
>             Fix For: 1.10.0
>
>         Attachments: RocksDBFlowFileRepo.html, rocksdb-flowfile-repo.adoc
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, when a FlowFile is written to the FlowFile Repository, the repo can either
fsync or not, depending on nifi.properties. We should allow a third option, of fsync only
for CREATE events. In this case, if we receive new data from a source we can fsync the update
to the FlowFile Repository before ACK'ing the data from the source. This allows us to guarantee
data persistence without the overhead of an fsync for every FlowFile Repository update.
> It may make sense, though, to be a bit more selective about when do this. For example
if the source is a system that does not allow us to acknowledge the receipt of data, such
as a ListenUDP processor, this doesn't really buy us much. In such a case, we could be smart
about avoiding the high cost of an fsync. However, for something like GetSFTP where we have
to remove the file in order to 'acknowledge receipt' we can ensure that we wait for the fsync
before proceeding.
> NOTE: This functionality was ultimately provided in a new implementation backed by RocksDB
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message