nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe Skora (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-994) Processor to tail files
Date Wed, 30 Sep 2015 18:40:05 GMT

    [ https://issues.apache.org/jira/browse/NIFI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938244#comment-14938244
] 

Joe Skora commented on NIFI-994:
--------------------------------

I think we are on the same page, but I left out some details.  The key is that the processor
always starts at the beginning when it finds a file but discards content it thinks was previously
committed downstream.

One approach could be storing a checksum of processed content with the other state when content
is committed downstream.  Files are always handled from the start, but those that exist when
the processor starts are checked against the stored state.  If the file has the same checksum
at the same offset as the state, the content up to the offset is discarded and the file is
processed from there on.  If the checksum at the offset is different, all the content is processed.

Any content that ages off while the Processor is stopped will be lost, but I don't see a way
around that.  That said, it might be possible to recognize some log rolling scenarios and
finish processing rolled out files that were previously in process while the regular behaviors
pickup the new file.

> Processor to tail files
> -----------------------
>
>                 Key: NIFI-994
>                 URL: https://issues.apache.org/jira/browse/NIFI-994
>             Project: Apache NiFi
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Joseph Percivall
>            Assignee: Joseph Percivall
>
> It's a very common data ingest situation to want to input text into the system by "tailing"
a file, most commonly log files. Currently we don't have an easy way to do this. 
> A simple processor to tail a file would benefit many users. There would need to be an
option to not just tail a file but pick up where the processor left off if it is interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message