Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2945F106C8 for ; Thu, 9 Jan 2014 07:09:26 +0000 (UTC) Received: (qmail 80400 invoked by uid 500); 9 Jan 2014 07:08:50 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 80353 invoked by uid 500); 9 Jan 2014 07:08:38 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 79981 invoked by uid 99); 9 Jan 2014 07:08:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jan 2014 07:08:35 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of churylin@gmail.com designates 74.125.82.52 as permitted sender) Received: from [74.125.82.52] (HELO mail-wg0-f52.google.com) (74.125.82.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jan 2014 07:08:31 +0000 Received: by mail-wg0-f52.google.com with SMTP id b13so2147775wgh.7 for ; Wed, 08 Jan 2014 23:08:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vG20mWgy3aoEXhkaXkRi5g2A9cu5Ub3e5AwUN5Y4rik=; b=kld9e6W3XmDrfPSkbQ2OT0O5Z2D0O7Wcjstw7CN9dpYvypCjI251DpwoEb5BUs5p2s fes/kjDaseoaM6Ndhr2EtyNjocdw2FgHgX00Lft766mdO7wQj9Xt7QbbarYzrMGB89PM u7JOC3cEtVHgjmoVjVkKburFxqC6bmksY7WFvSTfsZRdBV3zxBmpiEHnwKCLI7Edgh7y QXQUY2RGP0VSBuvH4AbWYd2OTYqON0BfComdlHBjJUb22VKzpGqTTM62ZwxogQEy96gd QtGd5o0mBnZ48ydeck2S9WsrWmbL5lxkvvnF3TMyZIXZV02mP3hu7hlnyu5RoMzKD0gh Begg== MIME-Version: 1.0 X-Received: by 10.181.11.201 with SMTP id ek9mr1657836wid.54.1389251290532; Wed, 08 Jan 2014 23:08:10 -0800 (PST) Received: by 10.216.12.129 with HTTP; Wed, 8 Jan 2014 23:08:10 -0800 (PST) In-Reply-To: References: Date: Thu, 9 Jan 2014 15:08:10 +0800 Message-ID: Subject: Re: questions about multilang bolt's STDIN&STDOUT From: churly lin To: user Content-Type: multipart/alternative; boundary=f46d043be044ad0fbe04ef844585 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043be044ad0fbe04ef844585 Content-Type: text/plain; charset=ISO-8859-1 Thank you Verardi! Sorry for my poor English that making my question ambiguously. Your answer is so clearly. Now I know that the byte arrays can be used as "values" for the field "tuple". One more question, In my project, I emit byte array tuples in KafakSpout like this: List tup = _spoutConfig.scheme.deserialize(Utils.toByteArray(toEmit.msg.payload())); // byte array collector.emit(tup), new KafkaMessageId(_partition, toEmit.offset)); But when I tried to get tuple in ShellBolt, readMsgs() got an exception. And I write the emitting JSON message to file, it looks like: *{"id":"7617035644022584549","stream":"default","comp":"KafkaSpout","tuple":[[B@6d695bcc],"task":1}* It looks very weird for me. What is the *[[B@6d695bcc]*? Is it a byte array's object address? Can It be read by Python? Going even further, If I insist on emitting byte arrays in KafkaSpout, What should I do to readMsgs in Python Bolt? Thanks again. 2014/1/9 Antonio Verardi > Hi, > > I am extensively using the multilang interface for Python. JSON is the way > you serialize things for communication. It adds a fairly amount of > overhead, but it is a reasonable design choice in terms of a multilang > interface. > > If your question is: can I read byte array messages from a bolt (made up > by command, id, stream, task and tuple), the answer is "that's not that > easy, you should implement something in order to do that". > > If your question is: can I serialize byte arrays in JSON with Python and > use them as "values" for the field "tuple", the answer is: "yes, even > though JSON always produce string objects". [ > http://docs.python.org/3.3/library/json.html#basic-usage]. You may want > to modify storm.py, in order to do that, or simply encode and decode your > data within your own bolt, it depends on your needs. > > This is something I found just googling about encoding binary data in JSON: > http://bytes.com/topic/python/answers/681314-simplejson-pack-binary-data > > I hope it was what you were looking for, > Antonio Uccio Verardi > > > > > On Tue, Jan 7, 2014 at 11:24 PM, churly lin wrote: > >> Hi all, >> >> I am trying to write a topology with a KafkaSpout and a >> ShellBolt(implemented by python ). >> According to the Multilang-protocol, >> multilang uses json messages over stdin/stdout to communicate with the >> subprocess. Specially, *both ends of this protocol use a line-reading >> mechanism. *Does it mean that, in multilang, we could not emit message >> as byte array? If not, how to read a byte array tuple in a python bolt ? >> the json which was read by python bolt is look like: >> >> >> { >> "command": "emit", >> // The id for the tuple. Leave this out for an unreliable emit. The id can >> // be a string or a number. >> "id": "1231231", >> // The id of the stream this tuple was emitted to. Leave this empty to emit to default stream. >> "stream": "1", >> // If doing an emit direct, indicate the task to send the tuple to >> "task": 9, >> // All the values in this tuple >> "tuple": ["field1", 2, 3]} >> >> This example shows that, the "tuple" can be String("field1") and >> number(2, 3). Could it be a byte array? >> > > --f46d043be044ad0fbe04ef844585 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thank you Verardi!
Sorry for my poor English that making my question ambiguously. =A0You= r answer is so clearly.=A0=A0Now I know that the byte arrays can be = used as "values" for the field "tuple".
One more question, In my project, = I emit byte array tuples in KafakSpout like this:
=A0 =A0 List<Object> tup =3D _spoutConfig.= scheme.deserialize(Utils.toByteArray(toEmit.msg.payload())); =A0// byte arr= ay
=A0 =A0 collector.emit= (tup), new KafkaMessageId(_partition, toEmit.offset));
But when I tried to get tuple in ShellB= olt, readMsgs() got an exception. And I write the emitting JSON message to = file, it looks like:
{"id":"7617035644= 022584549","stream":"default","comp":&qu= ot;KafkaSpout","tuple":[[B@6d695bcc],"task":1}=
It looks very weird fo= r me. What is the=A0[[B@= 6d695bcc]? Is it a byte = array's object address? Can It be read by Python?
Going even further, If = I insist on emitting byte arrays in KafkaSpout, What should I do to readMsg= s in Python Bolt?=A0

Thanks aga= in.


2014/1/9 Antonio Verardi <antonio@yelp.com>
Hi,
=
I am extensively using the multilang interface for Python. JSON i= s the way you serialize things for communication. It adds a fairly amount o= f overhead, but it is a reasonable design choice in terms of a multilang in= terface.

If your question is: can I read byte array messages from a bolt (made u= p by command, id, stream, task and tuple), the answer is "that's n= ot that easy, you should implement something in order to do that".

If your question is: can I serialize byte arrays in JSON with Pyt= hon and use them as "values" for the field "tuple", the= answer is: "yes, even though JSON always produce string objects"= . [http://docs.python.org/3.3/library/json.html#basic-usage= ]. You may want to modify storm.py, in order to do that, or simply encode a= nd decode your data within your own bolt, it depends on your needs.

This is something I found just googling about encoding binary data in J= SON:
http://bytes.com/topic/python/answers/6= 81314-simplejson-pack-binary-data

I hope it was what you were looking for,
Antonio Uccio V= erardi




On= Tue, Jan 7, 2014 at 11:24 PM, churly lin <churylin@gmail.com> wrote:
Hi all,
<= br>
I am trying to write a topolo= gy with a KafkaSpout and a ShellBolt(implemented by python ).
According to the=A0Multilang-protocol, multilang=A0uses json messages over stdin/stdout to communicate with= the subprocess. Specially, both ends of this protocol use a line-readin= g mechanism. Does it mean that, in multilang, we could not emit message= as byte array? If not, how to read a byte array tuple in a python bolt ?
the json which was read by python bolt is look like= :

{
        "command": =
"emit",
        // The id =
for the tuple. Leave this out for an unreliable emit. The id can
    // be a string=
 or a number.
        "id": "=
;1231231",
        // The id =
of the stream this tuple was emitted to. Leave this empty to emit to defaul=
t stream.
        "stream": &=
quot;1",
        // If doin=
g an emit direct, indicate the task to send the tuple to
        "task": 9,
        // All the=
 values in this tuple
        "tuple": [&=
quot;field1", 2, 3]}
This example shows that, the &quo= t;tuple" can be String("field1") and number(2, 3). Could it = be a byte array?


--f46d043be044ad0fbe04ef844585--