nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [nifi] markap14 commented on issue #3850: NIFI-6398 Added the 'replace first' and 'replace all' strategy to ReplaceText
Date Mon, 28 Oct 2019 20:27:20 GMT
markap14 commented on issue #3850: NIFI-6398 Added the 'replace first' and 'replace all' strategy
to ReplaceText
URL: https://github.com/apache/nifi/pull/3850#issuecomment-547131105
 
 
   @HorizonNet can you explain the difference between "Replace All" and "Regex"? I believe
they are intended to accomplish the same thing, but the existing Regex allows the user to
change the behavior between Line-by-Line vs. Entire Text. Using the Line-by-Line mode is preferred
if the Regex does not span multiple lines because it uses dramatically less heap. It also
allows for back references, etc. Not sure if the new "Replace All" provides a capability that
isn't currently supported, that I'm just missing?
   
   The Replace First is a nice addition. We should ensure, though, any time that we create
a `String` from `byte[]` that we pass in the character set in the constructor. It looks like
it's used when serializing the `String` back to the `byte[]` but not when creating the `String`
to begin with.
   
   Finally, rather than using `IOUtils.toByteArray()`, would recommend creating an `byte[]`
and then using `StringUtils.fillBuffer`, as is done in the `RegexReplace` strategy. Because
we already know the size of the FlowFile, this is far more efficient (because it doesn't have
to keep filling a buffer, creating a new one, and copying bytes over) and can cut down the
amount of heap used to store the buffered data by up to 50% (because the ByteArrayOutputStream
doubles the buffer size every time it runs out of space, so if it's already 512 KB and we
need one more byte it creates a 1 MB buffer just to use 512 KB + 1 byte, for example, whereas
a direct allocation takes exactly the right number of bytes).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message