storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruhollah Farchtchi <ruhollah.farcht...@gmail.com>
Subject Re: questions about multilang bolt's STDIN&STDOUT
Date Wed, 08 Jan 2014 19:57:28 GMT
I have had to do this for image data and per Antonio’s suggestion I am encoding and decoding
my byte-array into base64. I’m using the clojure DSL and I’ve found it to be fairly performant
(we have more optimizing on our image processing side to do). 

Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com



On Jan 8, 2014, at 1:55 PM, Antonio Verardi <antonio@yelp.com> wrote:

> Hi,
> 
> I am extensively using the multilang interface for Python. JSON is the way you serialize
things for communication. It adds a fairly amount of overhead, but it is a reasonable design
choice in terms of a multilang interface.
> 
> If your question is: can I read byte array messages from a bolt (made up by command,
id, stream, task and tuple), the answer is "that's not that easy, you should implement something
in order to do that".
> 
> If your question is: can I serialize byte arrays in JSON with Python and use them as
"values" for the field "tuple", the answer is: "yes, even though JSON always produce string
objects". [http://docs.python.org/3.3/library/json.html#basic-usage]. You may want to modify
storm.py, in order to do that, or simply encode and decode your data within your own bolt,
it depends on your needs. 
> 
> This is something I found just googling about encoding binary data in JSON:
> http://bytes.com/topic/python/answers/681314-simplejson-pack-binary-data
> 
> I hope it was what you were looking for,
> Antonio Uccio Verardi
> 
> 
> 
> 
> On Tue, Jan 7, 2014 at 11:24 PM, churly lin <churylin@gmail.com> wrote:
> Hi all,
> 
> I am trying to write a topology with a KafkaSpout and a ShellBolt(implemented by python
).
> According to the Multilang-protocol, multilang uses json messages over stdin/stdout to
communicate with the subprocess. Specially, both ends of this protocol use a line-reading
mechanism. Does it mean that, in multilang, we could not emit message as byte array? If not,
how to read a byte array tuple in a python bolt ?
> the json which was read by python bolt is look like:
> 
> {
>         "command": "emit",
>         // The id for the tuple. Leave this out for an unreliable emit. The id can
>     // be a string or a number.
>         "id": "1231231",
>         // The id of the stream this tuple was emitted to. Leave this empty to emit to
default stream.
>         "stream": "1",
>         // If doing an emit direct, indicate the task to send the tuple to
>         "task": 9,
>         // All the values in this tuple
>         "tuple": ["field1", 2, 3]}
> This example shows that, the "tuple" can be String("field1") and number(2, 3). Could
it be a byte array?
> 


Mime
View raw message