spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Kelly <jonathaka...@gmail.com>
Subject Re: spark on yarn wastes one box (or 1 GB on each box) for am container
Date Tue, 09 Feb 2016 06:49:11 GMT
Alex,

That's a very good question that I've been trying to answer myself recently
too. Since you've mentioned before that you're using EMR, I assume you're
asking this because you've noticed this behavior on emr-4.3.0.

In this release, we made some changes to the maximizeResourceAllocation
(which you may or may not be using, but either way this issue is present),
including the accidental inclusion of somewhat of a bug that makes it not
reserve any space for the AM, which ultimately results in one of the nodes
being utilized only by the AM and not an executor.

However, as you point out, the only viable fix seems to be to reserve
enough memory for the AM on *every single node*, which in some cases might
actually be worse than wasting a lot of memory on a single node.

So yeah, I also don't like either option. Is this just the price you pay
for running on YARN?


~ Jonathan
On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov <apivovarov@gmail.com>
wrote:

> Lets say that yarn has 53GB memory available on each slave
>
> spark.am container needs 896MB.  (512 + 384)
>
> I see two options to configure spark:
>
> 1. configure spark executors to use 52GB and leave 1 GB on each box. So,
> some box will also run am container. So, 1GB memory will not be used on all
> slaves but one.
>
> 2. configure spark to use all 53GB and add additional 53GB box which will
> run only am container. So, 52GB on this additional box will do nothing
>
> I do not like both options. Is there a better way to configure yarn/spark?
>
>
> Alex
>

Mime
View raw message