nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Klim <>
Subject RE: Generate flowfiles from flowfile content
Date Wed, 23 Sep 2015 22:41:21 GMT
Hello Bryan,
I should have been more specific. What I am trying to do is to fetch files from S3. I am using
the GetSQS processor to get new object (files) events, and each event is a json containing
the list of new objects (files) in the bucket. The output of the GetSQS is processed by SplitJson
and I get flowfiles containing one object key (filename) each. I need to feed this into FetchS3Object
to retrive the actual file, but FetchS3Object expects the flowfile filename attribute (or
any other) to be the filename. So I guess the problem is moving the filename string from the
flowfile content to some attribute.
If there is no other alternative, I will implement this processor.

Subject: RE: Generate flowfiles from flowfile content
Date: Wed, 23 Sep 2015 19:59:21 +0000

Good idea, Adam.
I will post a separate review thread on the dev@ list to track comments.
Here’s the repository link:
From: Adam Taft []

Sent: Wednesday, September 23, 2015 1:48 PM


Subject: Re: Generate flowfiles from flowfile content

Not speaking for the entire community, but I am sure that such a contribution would (at minimum)
be appreciated for review, consideration and potential inclusion.  The best thing would be
ideally hosting the
 source code somewhere that the rest of the community could go to for review.  Maybe you could
host the GetFileData and PutFileData processors on a GitHub repository somewhere?

I think the idea you proposed is good, but might need to be aligned with the work (if any)
for the referenced ListFile and FetchFile implementation.  And the differences in your PutFileData
vs. PutFile would
 ideally be well vetted as well.





On Wed, Sep 23, 2015 at 2:23 PM, Rick Braddy <> wrote:

We have already developed modified a modified GetFIle called GetFileData that takes an incoming
FlowFile containing the path to the file/directory that needs to be transferred. 
 There is a corresponding PutFileData on the other side that accepts the incoming file/directory
that creates the directory/tree as needed or writes the file, then sets the permissions and
ownership.  GetFileData also receives a file.rootdir attribute that
 gets passed along to PutFileData, so it can rebase the original file’s location relative
to the configured target directory.  Unlike GetFile/PutFile, these processor work with entire
directory trees and are triggered by incoming FlowFiles to GetFileData.
Eventually, we want to further enhance these two processors so they can break large files
into “chunks” and send as multi-part files that get reassembled by PutFileData, resolving
 the limitations associated with huge files and content repository size; e.g., there are default
100MB chunk threshold and 10MB chunk size properties that will control the chunking, if enabled.
If the community is interested would benefit from these processors, we’re happy to consider
further generalizing and contributing these processors, along with any further refinements
 based upon community review and feedback.
I believe these processors would address both the Jira and David’s original inquiry.
From: Adam Taft []

Sent: Wednesday, September 23, 2015 1:09 PM


Subject: Re: Generate flowfiles from flowfile content


Right.  This would be the use case that FetchFile [1] would help solve.



On Wed, Sep 23, 2015 at 1:11 PM, Bryan Bende <> wrote:

Hi David,


When you say "files I need to retrieve", are you referring to files on the local filesystem
where NiFi is running?


If so, I am not aware of an existing processor that does that. Currently we have GetFile which
polls a directory, but that is not what you want here.


It would be fairly straight forward to implement with a custom processor though... You would
read the incoming FlowFile content to get the filename, then create a new FlowFile with
 your desired name, and write the content of the local file to the new FlowFile.





On Wed, Sep 23, 2015 at 11:16 AM, David Klim <> wrote:



In a flow I am defining, I receive a flowfile containing json string. Using the splitJson
processor I can extract some json paths pointing to some files I need to retrieve, but
 the filename is the content of the generated flowfile. So I would need to be able to read
the content and generate a flowfile with that name instead. How could I do that?







View raw message