nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary
Date Thu, 02 Feb 2017 23:56:32 GMT
James,

If you'd rather work with the inputStream as bytes, you don't need the
IOUtils.toString() call, and I'm not sure what a UTF-8 charset would
do to your mixed data.  You can wrap any of the *InputStream
decorators around the inputStream object, such as DataInputStream [1]
to read various data types from the underlying bytes in the stream.
Alternatively you may want to read all the bytes into an array you can
work with directly via Jython methods instead of using Java I/O.

What's weird about the TypeError is that it looks like it is calling a
different write() method than I would've expected, I wonder if the
translation of Jython to Java objects is somehow making the processor
not be able to match up a method signature.  If the error is not
occurring in the redacted code block above, I will give this script a
try, to see if I can reproduce and/or fix the error.

Regards,
Matt

[1] https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html


On Thu, Feb 2, 2017 at 6:19 PM, James McMahon <jsmcmahon3@gmail.com> wrote:
> This is very helpful Russell, but in my case each file is a mix of data
> types. So even if i determine that the flowfile is a mix, I'd still have to
> be poised to tackle it it my ExecuteScript script. Good suggestion, though,
> and one I can use in other ways in my workflows.
>
> I do hope someone can tell me what I can do in my callback write back to
> handle all. I'd like to better understand this error I'm getting, too.  -Jim
>
> On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman <russ@windofkeltia.com>
> wrote:
>>
>> Could you use RouteOnContent to determine what sort of content you're
>> dealing with, then branch to different ExecuteScript processors rigged to
>> different Python scripts?
>>
>> Hope this comment is helpful.
>>
>>
>> On 02/02/2017 03:38 PM, James McMahon wrote:
>>
>> I have a flowfile that has tagged character information I need to get at
>> throughout the first few sections of the file. I need to use regex in python
>> to select some of those values and to transform others. I am using an
>> ExecuteScript processor to execute my python code. Here is my approach:
>>
>>
>>
>> = = = = =
>>
>> class PyStreamCallback(StreamCallback) :
>>
>>    def __init__ (self) :
>>
>>    def process(self, inputSteam, outputStream) :
>>
>>       stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8)  #
>> what happens to my binary and extreme chars when they get passed through
>> this step?
>>
>>      .
>>
>>      . (transform and pick out select content)
>>
>>      .
>>
>>      outputStream.write(bytearray(stuff.encode(‘utf-8’))))     # am I
>> using the wrong functions to put my text chars and my binary and my extreme
>> chars back on the stream as a byte stream? What should I be doing to handle
>> the variety of data?
>>
>>
>>
>> flowFile = session.get()
>>
>> if (flowFile!= None)
>>
>>    incoming = flowFile.getAttribute(‘filename’)
>>
>>    logging.info(‘about to process file: %s’, incoming)
>>
>>    flowFile = session.write(flowFile, PyStreamCallback())   # line 155 in
>> my code
>>
>>    session.transfer(flowFile, REL_SUCCESS)
>>
>>    session.commit()
>>
>>
>>
>> = = = = =
>>
>>
>>
>> When my incoming flowfile is all character content - such as tagged xml -
>> my code works fine. All the flowfiles that also contain some binary data
>> and/or characters at the extremes such as foreign language characters don’t
>> work. They error out. I suspect it has to do with the way I am writing back
>> to the flowfile stream.
>>
>>
>>
>> Here is the error I am getting:
>>
>> Org.apache.nifi.processor.exception.ProcessException:
>> javax.script.ScriptException: TypeError: write(): 1st arg can’t be coerced
>> to int, byte[] in <script> at line number 155
>>
>>
>>
>> How should I handle the write back to the flowfile in cases where I have a
>> mix of character and binary?
>>
>>
>>
>> Note: I must do this programmatically. I tried using a combination of
>> SplitContent and MergeContent, but I have no consistent reliable demarcation
>> between the regular text characters and the other more challenging
>> characters that I can split on.
>>
>> All the examples I've found handle more pure circumstances than mine seems
>> to be. For example, all text. Or all JSON. I've not yet been able to find an
>> example that shows me how to write back to the stream for mixed data
>> situations. Can you help?
>>
>>
>

Mime
View raw message