spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: Tuning Spark Streaming jobs
Date Mon, 05 Jan 2015 13:22:47 GMT
Hi Tim,

First of all, let m wish you a happy and fulfilling New Year.
Sorry for the delay in my response. I was out for the xmas break.

I've added my thoughts to the ticket from the perspective of a streaming
Job.
@TD: What do you think?

-kr, Gerard.

On Tue, Dec 23, 2014 at 8:02 PM, Timothy Chen <tnachen@gmail.com> wrote:

> Hi Gerard,
>
> SPARK-4286 is the ticket I am working on, which besides supporting shuffle
> service it also supports the executor scaling callbacks (kill/request
> total) for coarse grain mode.
>
> I created SPARK-4940 to discuss more about the distribution problem, and
> let's bring our discussions there.
>
> Tim
>
>
>
> On Dec 22, 2014, at 11:16 AM, Gerard Maas <gerard.maas@gmail.com> wrote:
>
> Hi Tim,
>
> That would be awesome. We have seen some really disparate Mesos
> allocations for our Spark Streaming jobs. (like (7,4,1) over 3 executors
> for 4 kafka consumer instead of the ideal (3,3,3,3))
> For network dependent consumers, achieving an even deployment would
>  provide a reliable and reproducible streaming job execution from the
> performance point of view.
> We're deploying in coarse grain mode. Not sure Spark Streaming would work
> well in fine-grained given the added latency to acquire a worker.
>
> You mention that you're changing the Mesos scheduler. Is there a Jira
> where this job is taking place?
>
> -kr, Gerard.
>
>
> On Mon, Dec 22, 2014 at 6:01 PM, Timothy Chen <tnachen@gmail.com> wrote:
>
>> Hi Gerard,
>>
>> Really nice guide!
>>
>> I'm particularly interested in the Mesos scheduling side to more evenly
>> distribute cores across cluster.
>>
>> I wonder if you are using coarse grain mode or fine grain mode?
>>
>> I'm making changes to the spark mesos scheduler and I think we can
>> propose a best way to achieve what you mentioned.
>>
>> Tim
>>
>> Sent from my iPhone
>>
>> > On Dec 22, 2014, at 8:33 AM, Gerard Maas <gerard.maas@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > After facing issues with the performance of some of our Spark Streaming
>> > jobs, we invested quite some effort figuring out the factors that affect
>> > the performance characteristics of a Streaming job. We  defined an
>> > empirical model that helps us reason about Streaming jobs and applied
>> it to
>> > tune the jobs in order to maximize throughput.
>> >
>> > We have summarized our findings in a blog post with the intention of
>> > collecting feedback and hoping that it is useful to other Spark
>> Streaming
>> > users facing similar issues.
>> >
>> > http://www.virdata.com/tuning-spark/
>> >
>> > Your feedback is welcome.
>> >
>> > With kind regards,
>> >
>> > Gerard.
>> > Data Processing Team Lead
>> > Virdata.com
>> > @maasg
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message