Yep. That's what I figured. Thanks. 

On Sunday, January 12, 2014, Nathan Leung wrote:

Muliti lang interface uses json which is a text format. Given an earlier email (http://mail-archives.apache.org/mod_mbox/storm-user/201401.mbox/%3CCAEN10JreBSFO-=xhNjbn9r+5+F+G=AZ8rW58qDo8x32Gd-xUkg@mail.gmail.com%3E) the object appears to be serialized to json using toString which for byte array yields [B@<reference> where the [B is type information specifying byte array. Therefore you will have to encode to something like base64 that can represent your binary data on a text file.

On Jan 12, 2014 10:49 AM, "Ruhollah Farchtchi" <ruhollah.farchtchi@gmail.com> wrote:
I am using 0.9. What I think is the issue is that storm.py is having problems when deserializing a byte array. When I encode as base64 binary string I have no problems and it deserializes fine. Of course I would like to avoid this extra overhead if possible. All my binary objects are relatively small 200-300k max. 

On Sunday, January 12, 2014, a wrote:
hi , Farchtchi,

which storm version are you using ? 
IF the tuple is not serialized, then there is no need to use a JSON parser to parse the received tuple. I guess so.

Regards


2014/1/11 Ruhollah Farchtchi <ruhollah.farchtchi@gmail.com>
Yes I read that in the docs. However when receiving the byte array in storm.py it throws a json error when trying to parse the tuples. I didn't have time to look into it further as I am new to storm and python. 


On Saturday, January 11, 2014, a wrote:
There is no need to serialize binary data, just send it as it. 
As by defalut storm-0.9.0 use kryo serializer to serialize tuple values, I guess we can skip this serialization step.

Regards  



2014/1/10 Jon Logan <jmlogan@buffalo.edu>
You're going to run into issues if you have large tuples, because they are buffered in memory. I would suggest moving it to an exterior channel, like Redis, etc, and only passing meta-data through Storm.

Your other solution is to use quirky things like reflection to prevent your application from running out of memory when tuples are buffered.


On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <ruhollah.farchtchi@gmail.com> wrote:
I am using storm to process small (< 100k) image files. I don't have a real-time requirement as yet, but my bottle neck is more in the image processing than message passing between bolts. I am using the Clojure DSL and the python bolt. Everything I've put together right now is very much a prototype so my next steps are some further processing and integration. Passing byte arrays didn't seem to work so well so I have had to encode/decode into base64 binary as it seems the JSON parsers on the python side didn't like byte arrays. I plan to go back and perhaps re-do the integration with a native C++ bolt, however I believe that there are other ways to do this integration as well. I'm As with Wilson, I'm interested if anyone else is using Storm to process binary payloads and what they have found works.

Thanks,

Ruhollah



On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <lochlainn.wilson@gmail.com> wrote:
Hi all,

I am new to Storm and have been tasked with determining whether it is feasible for us to use Apache storm in my company. I have of course configured the sample projects and have been poking around. A red flag is raised with the "stream processing" style JSON parsing.

I am considering using storm with real time image processing bolts in C++. Packaging binary data into a JSON (by escaping it) looks like it will be slow and expensive. Is there a better way? Does anyone have experience processing large streams of binary data through storm?

How did it go?

Regards,

Lochlainn





--

======================================================

Gvain



--
Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com