I have had to do this for image data and per Antonioís suggestion I am encoding and decoding my byte-array into base64. Iím using the clojure DSL and Iíve found it to be fairly performant (we have more optimizing on our image processing side to do). 

Ruhollah Farchtchi

On Jan 8, 2014, at 1:55 PM, Antonio Verardi <antonio@yelp.com> wrote:


I am extensively using the multilang interface for Python. JSON is the way you serialize things for communication. It adds a fairly amount of overhead, but it is a reasonable design choice in terms of a multilang interface.

If your question is: can I read byte array messages from a bolt (made up by command, id, stream, task and tuple), the answer is "that's not that easy, you should implement something in order to do that".

If your question is: can I serialize byte arrays in JSON with Python and use them as "values" for the field "tuple", the answer is: "yes, even though JSON always produce string objects". [http://docs.python.org/3.3/library/json.html#basic-usage]. You may want to modify storm.py, in order to do that, or simply encode and decode your data within your own bolt, it depends on your needs.

This is something I found just googling about encoding binary data in JSON:

I hope it was what you were looking for,
Antonio Uccio Verardi

On Tue, Jan 7, 2014 at 11:24 PM, churly lin <churylin@gmail.com> wrote:
Hi all,

I am trying to write a topology with a KafkaSpout and a ShellBolt(implemented by python ).
According to the Multilang-protocol, multilang uses json messages over stdin/stdout to communicate with the subprocess. Specially, both ends of this protocol use a line-reading mechanism. Does it mean that, in multilang, we could not emit message as byte array? If not, how to read a byte array tuple in a python bolt ?
the json which was read by python bolt is look like:
        "command": "emit",
        // The id for the tuple. Leave this out for an unreliable emit. The id can
    // be a string or a number.
        "id": "1231231",
        // The id of the stream this tuple was emitted to. Leave this empty to emit to default stream.
        "stream": "1",
        // If doing an emit direct, indicate the task to send the tuple to
        "task": 9,
        // All the values in this tuple
        "tuple": ["field1", 2, 3]}
This example shows that, the "tuple" can be String("field1") and number(2, 3). Could it be a byte array?