spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PhuDuc Nguyen <duc.was.h...@gmail.com>
Subject Re: dynamic allocation w/ spark streaming on mesos?
Date Wed, 11 Nov 2015 21:36:45 GMT
Awesome, thanks for the tip!



On Wed, Nov 11, 2015 at 2:25 PM, Tathagata Das <tdas@databricks.com> wrote:

> The reason the existing dynamic allocation does not work out of the box
> for spark streaming is because the heuristics used for decided when to
> scale up/down is not the right one for micro-batch workloads. It works
> great for typical batch workloads. However you can use the underlying
> developer API to add / remove executors to implement your own scaling
> logic.
>
> 1. Use SparkContext.requestExecutor and SparkContext.killExecutor
>
> 2. Use StreamingListener to get the scheduling delay and processing times,
> and use that do a request or kill executors.
>
> TD
>
> On Wed, Nov 11, 2015 at 9:48 AM, PhuDuc Nguyen <duc.was.here@gmail.com>
> wrote:
>
>> Dean,
>>
>> Thanks for the reply. I'm searching (via spark mailing list archive and
>> google) and can't find the previous thread you mentioned. I've stumbled
>> upon a few but may not be the thread you're referring to. I'm very
>> interested in reading that discussion and any links/keywords would be
>> greatly appreciated.
>>
>> I can see it's a non-trivial problem to solve for every use case in
>> streaming and thus not yet supported in general. However, I think (maybe
>> naively) it can be solved for specific use cases. If I use the available
>> features to create a fault tolerant design - i.e. failures/dead nodes can
>> occur on master nodes, driver node, or executor nodes without data loss and
>> "at-least-once" semantics is acceptable - then can't I safely scale down in
>> streaming by killing executors? If this is not true, then are we saying
>> that streaming is not fault tolerant?
>>
>> I know it won't be as simple as setting a config like
>> spark.dyanmicAllocation.enabled=true and magically we'll have elastic
>> streaming, but I'm interested if anyone else has attempted to solve this
>> for their specific use case with extra coding involved? Pitfalls? Thoughts?
>>
>> thanks,
>> Duc
>>
>>
>>
>>
>> On Wed, Nov 11, 2015 at 8:36 AM, Dean Wampler <deanwampler@gmail.com>
>> wrote:
>>
>>> Dynamic allocation doesn't work yet with Spark Streaming in any cluster
>>> scenario. There was a previous thread on this topic which discusses the
>>> issues that need to be resolved.
>>>
>>> Dean Wampler, Ph.D.
>>> Author: Programming Scala, 2nd Edition
>>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>>> Typesafe <http://typesafe.com>
>>> @deanwampler <http://twitter.com/deanwampler>
>>> http://polyglotprogramming.com
>>>
>>> On Wed, Nov 11, 2015 at 8:09 AM, PhuDuc Nguyen <duc.was.here@gmail.com>
>>> wrote:
>>>
>>>> I'm trying to get Spark Streaming to scale up/down its number of
>>>> executors within Mesos based on workload. It's not scaling down. I'm using
>>>> Spark 1.5.1 reading from Kafka using the direct (receiver-less) approach.
>>>>
>>>> Based on this ticket https://issues.apache.org/jira/browse/SPARK-6287
>>>> with the right configuration, I have a simple example working with the
>>>> spark-shell connected to a Mesos cluster. By working I mean the number of
>>>> executors scales up/down based on workload. However, the spark-shell is not
>>>> a streaming example.
>>>>
>>>> What is that status of dynamic resource allocation with Spark Streaming
>>>> on Mesos? Is it supported at all? Or supported but with some caveats to
>>>> ensure no data loss?
>>>>
>>>> thanks,
>>>> Duc
>>>>
>>>
>>>
>>
>

Mime
View raw message