nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary
Date Sat, 04 Feb 2017 03:39:21 GMT
James,

I haven't had a chance to dig into this yet, but one thing I noticed
about your script was an issue identified by Bryan Rosander (NiFi
committer and all-around good guy :) as the probable cause of the
TypeError, namely the calling of bytearray() after encode() (the
latter of which already returns a byte array) [1]. Does removing the
call to bytearray() fix your script, or are there still issues with
decoding the input stream?

Regards,
Matt

[1] https://community.hortonworks.com/questions/81291/nifi-executescript-processor-error-using-string-in.html


On Thu, Feb 2, 2017 at 5:38 PM, James McMahon <jsmcmahon3@gmail.com> wrote:
> I have a flowfile that has tagged character information I need to get at
> throughout the first few sections of the file. I need to use regex in python
> to select some of those values and to transform others. I am using an
> ExecuteScript processor to execute my python code. Here is my approach:
>
>
>
> = = = = =
>
> class PyStreamCallback(StreamCallback) :
>
>    def __init__ (self) :
>
>    def process(self, inputSteam, outputStream) :
>
>       stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8)  # what
> happens to my binary and extreme chars when they get passed through this
> step?
>
>      .
>
>      . (transform and pick out select content)
>
>      .
>
>      outputStream.write(bytearray(stuff.encode(‘utf-8’))))     # am I using
> the wrong functions to put my text chars and my binary and my extreme chars
> back on the stream as a byte stream? What should I be doing to handle the
> variety of data?
>
>
>
> flowFile = session.get()
>
> if (flowFile!= None)
>
>    incoming = flowFile.getAttribute(‘filename’)
>
>    logging.info(‘about to process file: %s’, incoming)
>
>    flowFile = session.write(flowFile, PyStreamCallback())   # line 155 in my
> code
>
>    session.transfer(flowFile, REL_SUCCESS)
>
>    session.commit()
>
>
>
> = = = = =
>
>
>
> When my incoming flowfile is all character content - such as tagged xml - my
> code works fine. All the flowfiles that also contain some binary data and/or
> characters at the extremes such as foreign language characters don’t work.
> They error out. I suspect it has to do with the way I am writing back to the
> flowfile stream.
>
>
>
> Here is the error I am getting:
>
> Org.apache.nifi.processor.exception.ProcessException:
> javax.script.ScriptException: TypeError: write(): 1st arg can’t be coerced
> to int, byte[] in <script> at line number 155
>
>
>
> How should I handle the write back to the flowfile in cases where I have a
> mix of character and binary?
>
>
>
> Note: I must do this programmatically. I tried using a combination of
> SplitContent and MergeContent, but I have no consistent reliable demarcation
> between the regular text characters and the other more challenging
> characters that I can split on.
>
> All the examples I've found handle more pure circumstances than mine seems
> to be. For example, all text. Or all JSON. I've not yet been able to find an
> example that shows me how to write back to the stream for mixed data
> situations. Can you help?

Mime
View raw message