nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Bateman <r...@windofkeltia.com>
Subject Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary
Date Thu, 02 Feb 2017 23:02:07 GMT
Could you use /RouteOnContent/ to determine what sort of content you're 
dealing with, then branch to different /ExecuteScript/ processors rigged 
to different Python scripts?

Hope this comment is helpful.


On 02/02/2017 03:38 PM, James McMahon wrote:
>
> I have a flowfile that has tagged character information I need to get 
> at throughout the first few sections of the file. I need to use regex 
> in python to select some of those values and to transform others. I am 
> using an ExecuteScript processor to execute my python code. Here is my 
> approach:
>
> = = = = =
>
> class PyStreamCallback(StreamCallback) :
>
> def __init__ (self) :
>
> def process(self, inputSteam, outputStream) :
>
> stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8) # what 
> happens to my binary and extreme chars when they get passed through 
> this step?
>
> .
>
> . (transform and pick out select content)
>
> .
>
> outputStream.write(bytearray(stuff.encode(‘utf-8’))))     # am I using 
> the wrong functions to put my text chars and my binary and my extreme 
> chars back on the stream as a byte stream? What should I be doing to 
> handle the variety of data?
>
> flowFile = session.get()
>
> if (flowFile!= None)
>
> incoming = flowFile.getAttribute(‘filename’)
>
> logging.info <http://logging.info>(‘about to process file: %s’, incoming)
>
> flowFile = session.write(flowFile, PyStreamCallback())   # line 155 in 
> my code
>
> session.transfer(flowFile, REL_SUCCESS)
>
> session.commit()
>
> = = = = =
>
> When my incoming flowfile is all character content - such as tagged 
> xml - my code works fine. All the flowfiles that also contain some 
> binary data and/or characters at the extremes such as foreign language 
> characters don’t work. They error out. I suspect it has to do with the 
> way I am writing back to the flowfile stream.
>
> Here is the error I am getting:
>
> Org.apache.nifi.processor.exception.ProcessException: 
> javax.script.ScriptException: TypeError: write(): 1^st arg can’t be 
> coerced to int, byte[] in <script> at line number 155
>
> How should I handle the write back to the flowfile in cases where I 
> have a mix of character and binary?
>
> Note: I must do this programmatically. I tried using a combination of 
> SplitContent and MergeContent, but I have no consistent reliable 
> demarcation between the regular text characters and the other more 
> challenging characters that I can split on.
>
> All the examples I've found handle more pure circumstances than mine 
> seems to be. For example, all text. Or all JSON. I've not yet been 
> able to find an example that shows me how to write back to the stream 
> for mixed data situations. Can you help?


Mime
View raw message