I have had to do this for image data and per Antonioís suggestion I am encoding and decoding my byte-array into base64. Iím using the clojure DSL and Iíve found it to be fairly performant (we have more optimizing on our image processing side to do). 

Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com



On Jan 8, 2014, at 1:55 PM, Antonio Verardi <antonio@yelp.com> wrote:

Hi,

I am extensively using the multilang interface for Python. JSON is the way you serialize things for communication. It adds a fairly amount of overhead, but it is a reasonable design choice in terms of a multilang interface.

If your question is: can I read byte array messages from a bolt (made up by command, id, stream, task and tuple), the answer is "that's not that easy, you should implement something in order to do that".

If your question is: can I serialize byte arrays in JSON with Python and use them as "values" for the field "tuple", the answer is: "yes, even though JSON always produce string objects". [http://docs.python.org/3.3/library/json.html#basic-usage]. You may want to modify storm.py, in order to do that, or simply encode and decode your data within your own bolt, it depends on your needs.

This is something I found just googling about encoding binary data in JSON:
http://bytes.com/topic/python/answers/681314-simplejson-pack-binary-data

I hope it was what you were looking for,
Antonio Uccio Verardi




On Tue, Jan 7, 2014 at 11:24 PM, churly lin <churylin@gmail.com> wrote:
Hi all,

I am trying to write a topology with a KafkaSpout and a ShellBolt(implemented by python ).
According to the Multilang-protocol, multilang uses json messages over stdin/stdout to communicate with the subprocess. Specially, both ends of this protocol use a line-reading mechanism. Does it mean that, in multilang, we could not emit message as byte array? If not, how to read a byte array tuple in a python bolt ?
the json which was read by python bolt is look like:
{
        "command": "emit",
        // The id for the tuple. Leave this out for an unreliable emit. The id can
    // be a string or a number.
        "id": "1231231",
        // The id of the stream this tuple was emitted to. Leave this empty to emit to default stream.
        "stream": "1",
        // If doing an emit direct, indicate the task to send the tuple to
        "task": 9,
        // All the values in this tuple
        "tuple": ["field1", 2, 3]}
This example shows that, the "tuple" can be String("field1") and number(2, 3). Could it be a byte array?