nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: ListSFTP incoming relationship
Date Tue, 27 Mar 2018 13:56:47 GMT
I'm not sure that would solve the problem because you'd still be
limited to one directory. What most people are asking for is the
ability to use a dynamic directory from an incoming flow file.

I think we might be trying to fit two different use-cases into one
processor which might not make sense.

Scenario #1... There is a directory that is constantly receiving new
data and has a significant amount of files, and I want to periodically
find new files. This is what the current processors are optimized for.

Scenario #2... There is a directory that is mostly static with a
moderate/small number of files, and at points in my flow I want to
dynamically perform a listing of this directory and retrieve the
files. This is more geared towards the mentality of running a
job/workflow.




On Tue, Mar 27, 2018 at 9:36 AM, Otto Fowler <ottobackwards@gmail.com> wrote:
> What if the changes where ‘on top of’ some base set of properties, like
> directory?
> Like a filter, where if present from the incoming file will have the LIST*
> list only things
> that match a name or attribute?
>
>
>
> On March 27, 2018 at 00:08:41, Joe Witt (joe.witt@gmail.com) wrote:
>
> Scott
>
> This idea has come up a couple of times and there is definitely
> something intriguing to it. Where I think this idea stalls out though
> is in implementation.
>
> While I agree that the other List* processors might similarly benefit
> lets focus on ListFile. Today you tell ListFile what directory to
> start looking for files in. It goes off scanning that directory for
> hits and stores state about what it has already searched/seen. And it
> is important to keep track of how much it has already scanned because
> at times the search directory can be massive (100,000s of thousands or
> more files and directories to scan for example).
>
> In the proposed model the directory to be scanned could be provided
> dynamically by looking at an attribute of an incoming flowfile (or
> other criteria can be provided - not just the directory to scan). In
> this case the ListFile processor goes on scanning against that now.
> What about the previous directory (or directories) it was told to
> scan? Does it still track those too? What if it starts scanning the
> newly provided directory, hasn't finished pulling all the data or new
> data is continually arriving, and it is told to switch to another
> directory.
>
> I think if those questions can get solid answers and someone invests
> time in creating a PR then this could be pretty powerful. Would be
> good to see a written description of the use case(s) for this too.
>
> Thanks
> Joe
>
> On Mon, Mar 26, 2018 at 11:58 PM, scott <tcots8888@gmail.com> wrote:
>> Hello Devs,
>>
>> I would like to request a feature to a major processor, ListSFTP. But
> before
>> I do down the official road, I wanted to ask if anyone thought it was a
>> terrible idea or impossible, etc. The request is to add support for an
>> incoming relationship to the ListSFTP processor specifically, but I could
>> see it added to many of the commonly used head processes, such as
> ListFile.
>> I would envision functionality more like InvokeHTTP or ExecuteSQL, where
> an
>> incoming flow file could initiate the action, and the attributes in the
>> incoming flow file could be used to configure the processor actions. It's
>> the configuration aspect that most appeals to me, because it opens it up
> to
>> being centrally or dynamically configured.
>>
>> Thanks,
>>
>> Scott
>>

Mime
View raw message