nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <>
Subject Re: Two issues relating to a processor I'm developing
Date Mon, 30 Oct 2017 13:32:21 GMT

Regarding the licensing, I believe LGPL is a no-go for Apache projects.

Take a look here:


On Sat, Oct 28, 2017 at 4:47 PM, Mike Thomsen <> wrote:
> The processor breaks down a much larger file into a huge number of small
> data points. We're talking like turning a 1.1M line file into about 2.5B
> data points.
> My current approach is "read a file with GetFile, save to /tmp, break down
> into a bunch of large CSV record batches (like a few hundred thousand
> records per group)" and then commit.
> It's slow, and with some good debugging statements, I can see the processor
> tearing into the data just fine. However, I am thinking about adding a
> variant to this which would be an "iterative" version that would follow
> this pattern:
> "read the file, save to /tmp, load the file, keep the current read position
> intact, every onTrigger call sends out a batch w/ session.commit() until
> it's done reading. Then grab the next flowfile."
> Does anyone have any suggestions on good practices to follow here,
> potential concerns, etc.? (Note: I have to write the file to /tmp because a
> library I am using which I don't want to fork doesn't have an API that can
> read from a stream rather than a
> Also, are there any issues with accepting a contribution that makes use of
> a LGPL-licensed library, in the event that my client wants to open source
> it (we think they will)?
> Thanks,
> Mike

View raw message