spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: Spark Streaming reads from stdin or output from command line utility
Date Fri, 12 Jun 2015 07:21:50 GMT
Would using the socketTextStream and `yourApp | nc -lk <port>` work?? Not
sure how resilient the socket receiver is though. I've been playing with it
for a little demo and I don't understand yet its reconnection behavior.

Although I would think that putting some elastic buffer in between would be
a good idea to decouple producer from consumer. Kafka would be my first
choice.

-kr, Gerard.

On Fri, Jun 12, 2015 at 8:46 AM, Heath Guo <heathguo@fb.com> wrote:

>  Yes, it is lots of data, and the utility I'm working with prints out
> infinite real time data stream. Thanks.
>
>
>   From: Tathagata Das <tdas@databricks.com>
> Date: Thursday, June 11, 2015 at 11:43 PM
>
> To: Heath Guo <heathguo@fb.com>
> Cc: user <user@spark.apache.org>
> Subject: Re: Spark Streaming reads from stdin or output from command line
> utility
>
>   Is it a lot of data that is expected to come through stdin? I mean is
> it even worth parallelizing the computation using something like Spark
> Streaming?
>
> On Thu, Jun 11, 2015 at 9:56 PM, Heath Guo <heathguo@fb.com> wrote:
>
>>   Thanks for your reply! In my use case, it would be stream from only
>> one stdin. Also I'm working with Scala.
>> It would be great if you could talk about multi stdin case as well!
>> Thanks.
>>
>>   From: Tathagata Das <tdas@databricks.com>
>> Date: Thursday, June 11, 2015 at 8:11 PM
>> To: Heath Guo <heathguo@fb.com>
>> Cc: user <user@spark.apache.org>
>> Subject: Re: Spark Streaming reads from stdin or output from command
>> line utility
>>
>>    Are you going to receive data from one stdin from one machine, or
>> many stdins on many machines?
>>
>>
>> On Thu, Jun 11, 2015 at 7:25 PM, foobar <heathguo@fb.com> wrote:
>>
>>> Hi, I'm new to Spark Streaming, and I want to create a application where
>>> Spark Streaming could create DStream from stdin. Basically I have a
>>> command
>>> line utility that generates stream data, and I'd like to pipe data into
>>> DStream. What's the best way to do that? I thought rdd.pipe() could help,
>>> but it seems that requires an rdd in the first place, which does not
>>> apply.
>>> Thanks!
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-reads-from-stdin-or-output-from-command-line-utility-tp23289.html
>>> <https://urldefense.proofpoint.com/v1/url?u=http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-reads-from-stdin-or-output-from-command-line-utility-tp23289.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=4Z2U8tLm1orBgymimfryIw%3D%3D%0A&m=4O1SseOzl0OsOY1s4%2B3jfsvy21wseYOJS0gxhf1IYc8%3D%0A&s=3df5e3f1e40970c1cb5191b7e3d6c9957c86993d2ac1f2d7fb6b622c7ebeccdd>
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Mime
View raw message