nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (NIFI-3356) Provide a newly refactored provenance repository
Date Tue, 14 Feb 2017 17:48:41 GMT


ASF GitHub Bot commented on NIFI-3356:

Github user markap14 commented on a diff in the pull request:
    --- Diff: nifi-docs/src/main/asciidoc/administration-guide.adoc ---
    @@ -2074,7 +2074,25 @@ The Provenance Repository contains the information related to Data
Provenance. T
    -|nifi.provenance.repository.implementation|The Provenance Repository implementation.
The default value is org.apache.nifi.provenance.PersistentProvenanceRepository and should
only be changed with caution. To store provenance events in memory instead of on disk (at
the risk of data loss in the event of power/machine failure), set this property to org.apache.nifi.provenance.VolatileProvenanceRepository.
    +|nifi.provenance.repository.implementation|The Provenance Repository implementation.
The default value is org.apache.nifi.provenance.PersistentProvenanceRepository.
    +Two additional repositories are available as and should only be changed with caution.
    --- End diff --
    I agree - that was there previously when the only two options were Volatile and Persistent
Prov Repo and the note was there to warn that you should know what you're doing when you change
to Volatile. This warning can be removed now, I think, since there are two repos that provide
persistent storage of the data.

> Provide a newly refactored provenance repository
> ------------------------------------------------
>                 Key: NIFI-3356
>                 URL:
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 1.2.0
> The Persistent Provenance Repository has been redesigned a few different times over several
years. The original design for the repository was to provide storage of events and sequential
iteration over those events via a Reporting Task. After that, we added the ability to compress
the data so that it could be held longer. We then introduced the notion of indexing and searching
via Lucene. We've since made several more modifications to try to boost performance.
> At this point, however, the repository is still the bottleneck for many flows that handle
large volumes of small FlowFiles. We need a new implementation that is based around the current
goals for the repository and that can provide better throughput.

This message was sent by Atlassian JIRA

View raw message