nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: [DISCUSS] Streaming or "lazy" mode for `CompressContent`
Date Tue, 30 Jul 2019 16:42:25 GMT
Edward,

I like your point/comment regarding separation of concerns/cohesion.  I
think we could/should consider automatically decompressing data on the fly
for processors in general in the event we know a given set of data to be
compressed but being accessed for plaintext purposes.  For general block
compression types this is probably fair game and could be quite compelling
particularly to avoid the extra read/write/content repo hits involved.

That said, I think for the case of record readers/writers I'm not sure we
can avoid having a specific solution.  Some compression types can be
concatted together and some cannot.  Some record types would be
tolerant/still valid and some would not.

Thanks
Joe

On Tue, Jul 30, 2019 at 12:34 PM Edward Armes <edward.armes@gmail.com>
wrote:

> So while I agree with in principle and it is a good idea on paper.
>
>  My concern is that this starts to add a bolt-on bloat problem. The Nifi
> processors as they stand in general do follow the Unix Philosophy (Do one
> thing, and do it well). My concern is while it could just be a case with
> just adding a wrapper is that it then becomes an ask to just add the
> wrapper to other processors to add similar functionalty or other. This does
> start to cause a technical debt problem and also start to potentially a
> detrimental experience to the user. Some of this I have mentioned in the
> previous thread about the re-structuring the Nifi core.
>
> The reason why I suggest doing it either at the repo level or as the
> InputStream is handed over to the processor from the core is that it adds
> it as a global piece of functionality, which every processor that processes
> data that compress well could benefit from. Now ideally it would be nice to
> see it as a "per-flow" setting but I suspect that would be adding more
> complexity, than is actually needed.
>
> I have seen an issue where over the time the content repo took up quite a
> chunk of disk, for a multi-tenanted cluster that performed lots of small
> changes on lots of FlowFiles, now while the hosts were under resourced,
> being able to have compressed the content and trading it off for speed of
> data through the flow might have helped that situation quite a bit.
>
> Edward
>
> On Tue, Jul 30, 2019 at 4:21 PM Joe Witt <joe.witt@gmail.com> wrote:
>
> > Malthe
> >
> > I do see value in having the Record readers/writers understand and handle
> > compression directly as it will avoid the extra disk hit of decompress,
> > read, compress cycles using existing processes and further there are
> cases
> > where the compression is record specific and not just holistic block
> > encryption.
> >
> > I think Koji offered a great description of how to start thinking about
> > this.
> >
> > Thanks
> >
> > On Tue, Jul 30, 2019 at 10:47 AM Malthe <mborch@gmail.com> wrote:
> >
> > > In reference to NIFI-6496 [1], I'd like to open a discussion on adding
> > > compression support to flow files such that a processor such as
> > > `CompressContent` might function in a streaming or "lazy" mode.
> > >
> > > Context, more details and initial feedback can be found in the ticket
> > > referenced below as well as in a related SO entry [2].
> > >
> > > [1] https://issues.apache.org/jira/browse/NIFI-6496
> > > [2]
> > >
> >
> https://stackoverflow.com/questions/57005564/using-convertrecord-on-compressed-input
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message