nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Grande <apere...@gmail.com>
Subject Re: ListSFTP incoming relationship
Date Tue, 27 Mar 2018 15:33:38 GMT
The key here is that ListXXX processor maintains state. A directory is part
of such state. Allowing arbitrary directories via an expression would
create never ending stream of new entries in the state storage, effectively
engineering a distributed DoS attack on the NiFi node or shared ZK quorum
(for when state is stored in there).

Maybe if we focus on thinking about assumptions and restrictions the
processor should make to contain that risk...

Andrew

On Tue, Mar 27, 2018, 9:56 AM Bryan Bende <bbende@gmail.com> wrote:

> I'm not sure that would solve the problem because you'd still be
> limited to one directory. What most people are asking for is the
> ability to use a dynamic directory from an incoming flow file.
>
> I think we might be trying to fit two different use-cases into one
> processor which might not make sense.
>
> Scenario #1... There is a directory that is constantly receiving new
> data and has a significant amount of files, and I want to periodically
> find new files. This is what the current processors are optimized for.
>
> Scenario #2... There is a directory that is mostly static with a
> moderate/small number of files, and at points in my flow I want to
> dynamically perform a listing of this directory and retrieve the
> files. This is more geared towards the mentality of running a
> job/workflow.
>
>
>
>
> On Tue, Mar 27, 2018 at 9:36 AM, Otto Fowler <ottobackwards@gmail.com>
> wrote:
> > What if the changes where ‘on top of’ some base set of properties, like
> > directory?
> > Like a filter, where if present from the incoming file will have the
> LIST*
> > list only things
> > that match a name or attribute?
> >
> >
> >
> > On March 27, 2018 at 00:08:41, Joe Witt (joe.witt@gmail.com) wrote:
> >
> > Scott
> >
> > This idea has come up a couple of times and there is definitely
> > something intriguing to it. Where I think this idea stalls out though
> > is in implementation.
> >
> > While I agree that the other List* processors might similarly benefit
> > lets focus on ListFile. Today you tell ListFile what directory to
> > start looking for files in. It goes off scanning that directory for
> > hits and stores state about what it has already searched/seen. And it
> > is important to keep track of how much it has already scanned because
> > at times the search directory can be massive (100,000s of thousands or
> > more files and directories to scan for example).
> >
> > In the proposed model the directory to be scanned could be provided
> > dynamically by looking at an attribute of an incoming flowfile (or
> > other criteria can be provided - not just the directory to scan). In
> > this case the ListFile processor goes on scanning against that now.
> > What about the previous directory (or directories) it was told to
> > scan? Does it still track those too? What if it starts scanning the
> > newly provided directory, hasn't finished pulling all the data or new
> > data is continually arriving, and it is told to switch to another
> > directory.
> >
> > I think if those questions can get solid answers and someone invests
> > time in creating a PR then this could be pretty powerful. Would be
> > good to see a written description of the use case(s) for this too.
> >
> > Thanks
> > Joe
> >
> > On Mon, Mar 26, 2018 at 11:58 PM, scott <tcots8888@gmail.com> wrote:
> >> Hello Devs,
> >>
> >> I would like to request a feature to a major processor, ListSFTP. But
> > before
> >> I do down the official road, I wanted to ask if anyone thought it was a
> >> terrible idea or impossible, etc. The request is to add support for an
> >> incoming relationship to the ListSFTP processor specifically, but I
> could
> >> see it added to many of the commonly used head processes, such as
> > ListFile.
> >> I would envision functionality more like InvokeHTTP or ExecuteSQL, where
> > an
> >> incoming flow file could initiate the action, and the attributes in the
> >> incoming flow file could be used to configure the processor actions.
> It's
> >> the configuration aspect that most appeals to me, because it opens it up
> > to
> >> being centrally or dynamically configured.
> >>
> >> Thanks,
> >>
> >> Scott
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message