Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AA901031A for ; Sun, 12 Jan 2014 16:18:20 +0000 (UTC) Received: (qmail 31538 invoked by uid 500); 12 Jan 2014 16:17:49 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 31518 invoked by uid 500); 12 Jan 2014 16:17:45 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 31507 invoked by uid 99); 12 Jan 2014 16:17:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Jan 2014 16:17:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ruhollah.farchtchi@gmail.com designates 209.85.223.179 as permitted sender) Received: from [209.85.223.179] (HELO mail-ie0-f179.google.com) (209.85.223.179) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 12 Jan 2014 16:17:37 +0000 Received: by mail-ie0-f179.google.com with SMTP id tp5so1203429ieb.38 for ; Sun, 12 Jan 2014 08:17:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=k0T8BMZ2vyJocl2JZGA9OdY0OsVB0BCZAuK4oaDldjo=; b=Q39tQE4LBnQKxxpFLU4EA5/obHKrDy1b6x2OdcpSBqlkH/QOrVIATkmIQOENpWkcZ2 jwCNUezyi3YVssYCx7Oy4p0TtedKsitQUgq/GnIvy6fuD8mS1OVxNYVLUCOy0uzD7wF9 hBd5/5Got7L28CJXaF/aLzp0piC/DjcIW/1Fqt/ymu7owKs6T0/Wrs1NSO5IEv63ZH70 S4a2HQMVg2NIK1Ra/R40eKeuMtZhlpsd0tgKfLdAo4ca21dYktXJLDfWqvvDPSOHB35S iC0kGhYKSfvr8bu0wdM6wEJYSxMGwqyXctQnzsAm2sDfbYK9Gc77gIwL2OiNXjF0oOOL otXg== MIME-Version: 1.0 X-Received: by 10.50.238.162 with SMTP id vl2mr14544287igc.45.1389543436537; Sun, 12 Jan 2014 08:17:16 -0800 (PST) Received: by 10.64.65.129 with HTTP; Sun, 12 Jan 2014 08:17:16 -0800 (PST) In-Reply-To: References: Date: Sun, 12 Jan 2014 11:17:16 -0500 Message-ID: Subject: Re: Large binary payloads with storm From: Ruhollah Farchtchi To: "user@storm.incubator.apache.org" Content-Type: multipart/alternative; boundary=001a11335cfcef567404efc84ada X-Virus-Checked: Checked by ClamAV on apache.org --001a11335cfcef567404efc84ada Content-Type: text/plain; charset=Big5 Content-Transfer-Encoding: quoted-printable Yep. That's what I figured. Thanks. On Sunday, January 12, 2014, Nathan Leung wrote: > Muliti lang interface uses json which is a text format. Given an earlier > email ( > http://mail-archives.apache.org/mod_mbox/storm-user/201401.mbox/%3CCAEN10= JreBSFO-=3DxhNjbn9r+5+F+G=3DAZ8rW58qDo8x32Gd-xUkg@mail.gmail.com%3E) > the object appears to be serialized to json using toString which for byte > array yields [B@ where the [B is type information specifying > byte array. Therefore you will have to encode to something like base64 th= at > can represent your binary data on a text file. > On Jan 12, 2014 10:49 AM, "Ruhollah Farchtchi" < > ruhollah.farchtchi@gmail.com> wrote: > > I am using 0.9. What I think is the issue is that storm.py is having > problems when deserializing a byte array. When I encode as base64 binary > string I have no problems and it deserializes fine. Of course I would lik= e > to avoid this extra overhead if possible. All my binary objects are > relatively small 200-300k max. > > On Sunday, January 12, 2014, =A7=F5=AEa=A7=BB wrote: > > hi , Farchtchi, > > which storm version are you using ? > IF the tuple is not serialized, then there is no need to use a JSON parse= r > to parse the received tuple. I guess so. > > Regards > > > 2014/1/11 Ruhollah Farchtchi > > Yes I read that in the docs. However when receiving the byte array in > storm.py it throws a json error when trying to parse the tuples. I didn't > have time to look into it further as I am new to storm and python. > > > On Saturday, January 11, 2014, =A7=F5=AEa=A7=BB wrote: > > There is no need to serialize binary data, just send it as it. > As by defalut storm-0.9.0 use kryo serializer to serialize tuple values, = I > guess we can skip this serialization step. > > Regards > > > > 2014/1/10 Jon Logan > > You're going to run into issues if you have large tuples, because they ar= e > buffered in memory. I would suggest moving it to an exterior channel, lik= e > Redis, etc, and only passing meta-data through Storm. > > Your other solution is to use quirky things like reflection to prevent > your application from running out of memory when tuples are buffered. > > > On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi < > ruhollah.farchtchi@gmail.com> wrote: > > I am using storm to process small (< 100k) image files. I don't have a > real-time requirement as yet, but my bottle neck is more in the image > processing than message passing between bolts. I am using the Clojure DSL > and the python bolt. Everything I've put together right now is very much = a > prototype so my next steps are some further processing and integration. > Passing byte arrays didn't seem to work so well so I have had to > encode/decode into base64 binary as it seems the JSON parsers on the pyth= on > side didn't like byte arrays. I plan to go back and perhaps re-do the > integration with a native C++ bolt, however I believe that there are othe= r > ways to do this integration as well. I'm As with Wilson, I'm interested i= f > anyone else is using Storm to process binary payloads and what they have > found works. > > Thanks, > > Ruhollah > > Ruhollah Farchtchi > ruhollah.farchtchi@gmail.com > > > On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson < > lochlainn.wilson@gmail.com> wrote: > > Hi all, > > I am new to Storm and have been tasked with determining whether it is > feasible for us to use Apache storm in my company. I have of course > configured the sample projects and have been poking around. A red flag is > raised with the "stream processing" style JSON parsing. > > I am considering using storm with real time image processing bolts in C++= . > Packaging binary data into a JSON (by escaping it) looks like it will be > slow and expensive. Is there a better way? Does anyone have experience > processing large streams of binary data through storm? > > How did it go? > > Regards, > > Lochlainn > > > > > > > -- > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > Gvain > > --=20 Ruhollah Farchtchi ruhollah.farchtchi@gmail.com --001a11335cfcef567404efc84ada Content-Type: text/html; charset=Big5 Content-Transfer-Encoding: quoted-printable Yep. That's what I figured. Thanks. 

On Sunday= , January 12, 2014, Nathan Leung wrote:
=

Muliti lang interface uses json which is a text format. Given an earlier em= ail (http://mail-archives.apache.org/mod_mbox/storm-us= er/201401.mbox/%3CCAEN10JreBSFO-=3DxhNjbn9r+5+F+G=3DAZ8rW58qDo8x32Gd-xUkg@m= ail.gmail.com%3E) the object appears to be serialized to json using toS= tring which for byte array yields [B@<reference> where the [B is type= information specifying byte array. Therefore you will have to encode to so= mething like base64 that can represent your binary data on a text file.

On Jan 12, 2014 10:49 AM, "Ruhollah Farchtchi" <ruholl= ah.farchtchi@gmail.com> wrote:
I am using 0.9. What I think is the issue is that storm.py is having proble= ms when deserializing a byte array. When I encode as base64 binary string I= have no problems and it deserializes fine. Of course I would like to avoid= this extra overhead if possible. All my binary objects are relatively smal= l 200-300k max. 

On Sunday, January 12, 2014, =A7=F5=AEa=A7=BB wrote:
hi , Farchtchi,

which storm version are you = using ? 
IF the tuple is not serialized, then there is no need to use a JSON pa= rser to parse the received tuple. I guess so.

Regards


2014/1/11 Ruhollah Farc= htchi <ruhollah.farchtchi@gmail.com><= br>
Yes I read that in the docs. However when receiving the byte array= in storm.py it throws a json error when trying to parse the tuples. I didn= 't have time to look into it further as I am new to storm and python.&n= bsp;


On Saturday, January 11, 2014, =A7=F5=AEa=A7=BB wrote:
There is no need to serialize binary data, just send it as it= . 
As by defalut storm-0.9.0 use kryo serializer to serialize tuple values, I = guess we can skip this serialization step.

Regards  



2014/1/10 Jon Lo= gan <jmlogan@buffalo.edu>
You're going to run into issues if you have l= arge tuples, because they are buffered in memory. I would suggest moving it= to an exterior channel, like Redis, etc, and only passing meta-data throug= h Storm.

Your other solution is to use quirky things like reflection = to prevent your application from running out of memory when tuples are buff= ered.


On Fri, Jan 10, 2014 at 8:49 AM, Ruhollah Farchtchi <<= a>ruhollah.farchtchi@gmail.com> wrote:
I am using storm to process small (< 100k) image files.= I don't have a real-time requirement as yet, but my bottle neck is mor= e in the image processing than message passing between bolts. I am using th= e Clojure DSL and the python bolt. Everything I've put together right n= ow is very much a prototype so my next steps are some further processing an= d integration. Passing byte arrays didn't seem to work so well so I hav= e had to encode/decode into base64 binary as it seems the JSON parsers on t= he python side didn't like byte arrays. I plan to go back and perhaps r= e-do the integration with a native C++ bolt, however I believe that there a= re other ways to do this integration as well. I'm As with Wilson, I'= ;m interested if anyone else is using Storm to process binary payloads and = what they have found works.

Thanks,

Ruhollah

Ruhollah Farchtchi
ruhollah.farchtchi@gmail.com


On Thu, Jan 9, 2014 at 10:24 PM, Lochlainn Wilson <lochlainn.wilson@gmail.com> wrote:
Hi all,

I am new to Storm and have been t= asked with=20 determining whether it is feasible for us to use Apache storm in my=20 company. I have of course configured the sample projects and have been=20 poking around. A red flag is raised with the "stream processing" = style=20 JSON parsing.

I am considering using storm with real time image processing bolts in C= ++. Packaging binary data into a JSON (by escaping it) looks like it will b= e slow and expensive. Is there a better way? Does anyone have experience pr= ocessing large streams of binary data through storm?

How did it go?

Regards,

Lochlainn





--
=

=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Gvain



--
Ruhollah Farchtchi
ruhollah.farchtchi@gmail.co= m
--001a11335cfcef567404efc84ada--