nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-756) Persistent Provenance Repository can avoid deleting events from lucene
Date Wed, 19 Aug 2015 15:35:45 GMT

    [ https://issues.apache.org/jira/browse/NIFI-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703202#comment-14703202
] 

Mark Payne commented on NIFI-756:
---------------------------------

[~aldrin]: agreed, the patch is from Aug 15 so I imagine it's not been rebased. Will create
a new patch and upload.

> Persistent Provenance Repository can avoid deleting events from lucene
> ----------------------------------------------------------------------
>
>                 Key: NIFI-756
>                 URL: https://issues.apache.org/jira/browse/NIFI-756
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 0.3.0
>
>         Attachments: 0001-NIFI-756-Do-not-remove-documents-from-a-Lucene-Index.patch
>
>
> Currently, when events expire in the repository, they are deleted from the indices. This
is very expensive. Since the index is sharded (by default at 500 MB), we can instead just
ensure that searches always have  a start date no earlier than the first provenance event.
This way, we won't retrieve any expired records, but they can remain in the index. When all
events in the index have expired (we know, based on the earliest event of the next index),
we can simply close all readers/writers for the expired index and delete the entire index.
This is far cheaper than continually updating the Lucene indices and would make a huge difference
in performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message