Return-Path: X-Original-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7337310CC1 for ; Sat, 22 Mar 2014 06:48:29 +0000 (UTC) Received: (qmail 11731 invoked by uid 500); 22 Mar 2014 06:48:27 -0000 Delivered-To: apmail-hadoop-yarn-dev-archive@hadoop.apache.org Received: (qmail 11511 invoked by uid 500); 22 Mar 2014 06:48:27 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-dev@hadoop.apache.org Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 11502 invoked by uid 99); 22 Mar 2014 06:48:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Mar 2014 06:48:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sandy.ryza@cloudera.com designates 209.85.192.43 as permitted sender) Received: from [209.85.192.43] (HELO mail-qg0-f43.google.com) (209.85.192.43) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Mar 2014 06:48:22 +0000 Received: by mail-qg0-f43.google.com with SMTP id f51so9992338qge.2 for ; Fri, 21 Mar 2014 23:48:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=1Vsi+K50YmBonotuOtlrPm9uDo4Ts5iNhz2HHv8VWIE=; b=UvE3DJTKxeEoTs61WzUh1z8qVifWNKi62ZGD8bL+cB4esv+PriZYiIbBzSNIoBerO4 2BXidB9tOCS1z4yHPORcqxU8/SGt2tmGzh8xHCpPKRrN+F1uygnMRfIaMh35CQThynY4 UEZJ4LzEPcjWWJNs1L7Hkf6F3jnRemC4DVcnjI5vflT2ZOiWgYreI20n+OXSlZF11tz2 IK1l/UykaZ4iGUnfPl5SHnfAooBwo15FaYQ55lA0aLEDBWJCQqEb2bd+cPC84P91I+E6 dcP9Nz1yJMjpdfl9dmWJvpFj0V5ZIvFvxtiC2dil2+EoOag7yvkIMDZGOrCrTnhX3Q/w ckTA== X-Gm-Message-State: ALoCoQkutwveAqj5g+gT2e53688ZcdRQcoFMcv0D+Dzb9aZpc37Id+4U+BW8wkr8RCveY9Ymid+y MIME-Version: 1.0 X-Received: by 10.224.29.136 with SMTP id q8mr61977122qac.32.1395470882275; Fri, 21 Mar 2014 23:48:02 -0700 (PDT) Received: by 10.140.109.180 with HTTP; Fri, 21 Mar 2014 23:48:02 -0700 (PDT) In-Reply-To: References: Date: Fri, 21 Mar 2014 23:48:02 -0700 Message-ID: Subject: Re: Capacity scheduler puts all containers on one box From: Sandy Ryza To: "yarn-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7bdc83263baf0204f52c6244 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc83263baf0204f52c6244 Content-Type: text/plain; charset=ISO-8859-1 For a work-around, if you turn on DRF / multi-resource scheduling, you could use vcore capacities to limit the number of containers per node? On Fri, Mar 21, 2014 at 11:35 PM, Chris Riccomini wrote: > Hey Guys, > > @Vinod: We aren't overriding the default, so we must be using -1 as the > setting. > > @Sandy: We aren't specifying any racks/hosts when sending the resource > requests. +1 regarding introducing a similar limit in capacity scheduler. > > Any recommended work-arounds in the mean time? Our utilization of the grid > is very low because we're having to force high memory requests for the > containers in order to guarantee a maximum number of containers on a > single node (e.g. Set container memory MB set to 17GB to disallow more > than 2 containers from being assigned to any one 48GB node). > > Cheers, > Chris > > On 3/21/14 11:30 PM, "Sandy Ryza" wrote: > > >yarn.scheduler.capacity.node-locality-delay will help if the app is > >requesting containers at particular locations, but won't help spread > >things > >out evenly otherwise. > > > >The Fair Scheduler attempts an even spread. By default, it only schedules > >a single container each time it considers a node. Decoupling scheduling > >from node heartbeats (YARN-1010) makes it so that a high node heartbeat > >interval doesn't result in this being slow. Now that the Capacity > >Scheduler has similar capabilities (YARN-1512), it might make sense to > >introduce a similar limit? > > > >-Sandy > > > > > >On Fri, Mar 21, 2014 at 4:42 PM, Vinod Kumar Vavilapalli > > >> wrote: > > > >> What's the value for yarn.scheduler.capacity.node-locality-delay? It is > >>-1 > >> by default in 2.2. > >> > >> We fixed the default to be a reasonable 40 (nodes in a rack) in 2.3.0 > >>that > >> should spread containers a bit. > >> > >> Thanks, > >> +Vinod > >> > >> On Mar 21, 2014, at 12:48 PM, Chris Riccomini > >> wrote: > >> > >> > Hey Guys, > >> > > >> > We're running YARN 2.2 with the capacity scheduler. Each NM is running > >> with 40G of memory capacity. When we request a series containers with > >>2G of > >> memory from a single AM, we see the RM assigning them entirely to one NM > >> until that NM is full, and then moving on to the next, and so on. > >> Essentially, we have a grid with 20 nodes, and two are completely full, > >>and > >> the rest are completely empty. This is problematic because our > >>containers > >> use disk heavily, and are completely saturating the disks on the two > >>nodes, > >> which slows all of the containers down on these NMs. > >> > > >> > 1. Is this expected behavior of the capacity scheduler? What about > >>the > >> fifo scheduler? > >> > 2. Is the recommended work around just to increase memory allocation > >> per-container as a proxy for the disk capacity that's required? Given > >>that > >> there's no disk-level isolation, and no disk-level resource, I don't see > >> another way around this. > >> > > >> > Cheers, > >> > Chris > >> > >> > >> -- > >> CONFIDENTIALITY NOTICE > >> NOTICE: This message is intended for the use of the individual or > >>entity to > >> which it is addressed and may contain information that is confidential, > >> privileged and exempt from disclosure under applicable law. If the > >>reader > >> of this message is not the intended recipient, you are hereby notified > >>that > >> any printing, copying, dissemination, distribution, disclosure or > >> forwarding of this communication is strictly prohibited. If you have > >> received this communication in error, please contact the sender > >>immediately > >> and delete it from your system. Thank You. > >> > > --047d7bdc83263baf0204f52c6244--