spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Kelly <jonathaka...@gmail.com>
Subject Re: spark on yarn wastes one box (or 1 GB on each box) for am container
Date Tue, 09 Feb 2016 16:47:01 GMT
Sean, I'm not sure if that's actually the case, since the AM would be
allocated before the executors are even requested (by the driver through
the AM), right? This must at least be the case with dynamicAllocation
enabled, but I would expect that it's true regardless.

However, Alex, yes, this would be possible on EMR if you use small CORE
instances and larger TASK instances. EMR is configured to run AMs only on
CORE instances, so if you don't need much HDFS space (HDFS is stored only
on CORE instances, not TASK instances), this might be a good option for
you. Note though that you would have to set spark.executor.memory yourself
though rather than using maximizeResourceAllocation because
maximizeResourceAllocation currently only considers the size of the CORE
instances when determining spark.{driver,executor}.memory.

~ Jonathan

On Tue, Feb 9, 2016 at 12:40 AM Sean Owen <sowen@cloudera.com> wrote:

> If it's too small to run an executor, I'd think it would be chosen for
> the AM as the only way to satisfy the request.
>
> On Tue, Feb 9, 2016 at 8:35 AM, Alexander Pivovarov
> <apivovarov@gmail.com> wrote:
> > If I add additional small box to the cluster can I configure yarn to
> select
> > small box to run am container?
> >
> >
> > On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen <sowen@cloudera.com> wrote:
> >>
> >> Typically YARN is there because you're mediating resource requests
> >> from things besides Spark, so yeah using every bit of the cluster is a
> >> little bit of a corner case. There's not a good answer if all your
> >> nodes are the same size.
> >>
> >> I think you can let YARN over-commit RAM though, and allocate more
> >> memory than it actually has. It may be beneficial to let them all
> >> think they have an extra GB, and let one node running the AM
> >> technically be overcommitted, a state which won't hurt at all unless
> >> you're really really tight on memory, in which case something might
> >> get killed.
> >>
> >> On Tue, Feb 9, 2016 at 6:49 AM, Jonathan Kelly <jonathakamzn@gmail.com>
> >> wrote:
> >> > Alex,
> >> >
> >> > That's a very good question that I've been trying to answer myself
> >> > recently
> >> > too. Since you've mentioned before that you're using EMR, I assume
> >> > you're
> >> > asking this because you've noticed this behavior on emr-4.3.0.
> >> >
> >> > In this release, we made some changes to the
> maximizeResourceAllocation
> >> > (which you may or may not be using, but either way this issue is
> >> > present),
> >> > including the accidental inclusion of somewhat of a bug that makes it
> >> > not
> >> > reserve any space for the AM, which ultimately results in one of the
> >> > nodes
> >> > being utilized only by the AM and not an executor.
> >> >
> >> > However, as you point out, the only viable fix seems to be to reserve
> >> > enough
> >> > memory for the AM on *every single node*, which in some cases might
> >> > actually
> >> > be worse than wasting a lot of memory on a single node.
> >> >
> >> > So yeah, I also don't like either option. Is this just the price you
> pay
> >> > for
> >> > running on YARN?
> >> >
> >> >
> >> > ~ Jonathan
> >> >
> >> > On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov
> >> > <apivovarov@gmail.com>
> >> > wrote:
> >> >>
> >> >> Lets say that yarn has 53GB memory available on each slave
> >> >>
> >> >> spark.am container needs 896MB.  (512 + 384)
> >> >>
> >> >> I see two options to configure spark:
> >> >>
> >> >> 1. configure spark executors to use 52GB and leave 1 GB on each box.
> >> >> So,
> >> >> some box will also run am container. So, 1GB memory will not be used
> on
> >> >> all
> >> >> slaves but one.
> >> >>
> >> >> 2. configure spark to use all 53GB and add additional 53GB box which
> >> >> will
> >> >> run only am container. So, 52GB on this additional box will do
> nothing
> >> >>
> >> >> I do not like both options. Is there a better way to configure
> >> >> yarn/spark?
> >> >>
> >> >>
> >> >> Alex
> >
> >
>

Mime
View raw message