Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2FA2F19F6D for ; Tue, 12 Apr 2016 20:58:20 +0000 (UTC) Received: (qmail 33524 invoked by uid 500); 12 Apr 2016 20:58:15 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 33378 invoked by uid 500); 12 Apr 2016 20:58:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 33367 invoked by uid 99); 12 Apr 2016 20:58:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2016 20:58:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 51E9A18044E for ; Tue, 12 Apr 2016 20:58:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.702 X-Spam-Level: X-Spam-Status: No, score=-0.702 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=allenai-org.20150623.gappssmtp.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 0BonJRyLFDPj for ; Tue, 12 Apr 2016 20:58:12 +0000 (UTC) Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id CA3805F1F0 for ; Tue, 12 Apr 2016 20:58:11 +0000 (UTC) Received: by mail-io0-f175.google.com with SMTP id o126so45047951iod.0 for ; Tue, 12 Apr 2016 13:58:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=allenai-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=oApPNsaLPzdaQuJsZ69eQJLX1/7hdgdoe4xD4gUO1jQ=; b=R8NoxOWngU0LjjdyInsmez8jjJQxk1r5d674Bi1Zx9IyOqzN/POWDRjau6EvnSx2kb v8FLu4DnQjSDUUnJ38TyBIKhWm6BoIRg7PeC/6vwFGry/Kp0G4i2nd3QtjI9y+eAZ0ce ABvp490oLgVGX5qs+YLK2pnKcr91BxoDXgy/4On21PUsW925dm5E60t7DdYm6oD4BHas 31jQm2IVV72abV6pxC1JM0SoX2RM0ljllwhigE705iVKBa3R7tkQed0y3VxHG8xDAdhC nNdi2w6rkMt7FAUuYkpvCbmFjZFn5zDEAteH5VeCB/1lrNioDj6qwrfoTgZilnrAxKlF A3Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=oApPNsaLPzdaQuJsZ69eQJLX1/7hdgdoe4xD4gUO1jQ=; b=JgUtjbSrj1ls94SGhEf7d1N133XjqF12Z1WaF/xMg5Up1Bd29+tXdA369rFMwbOhR5 NcttIOMw6kN77YKDart5qqlIyeJePpmgdUMNiSp32C6YM6JB1zPg/EzQPxH52p3TFoAD wAyIoqRl5Vbq7nsiDgFnOfBpyfRqfORq16nqadoFEU7OtjyUpk8DN9LQJRV0mVzCgHNM AS9pps50WuC4AjB7vst+Oja5H89N46udhMwtG8K5AOrGvN2HwHkPF+gjHA2vdFdN+8Da uzfoAQWXHrXbD2Jlgjit+fBTPj0G1xcQL6ILbTzTV7T23H7lYzBiOXS00iH0ZVYbTF8U Y7kw== X-Gm-Message-State: AOPr4FUvdS0Nl0LLKSSH01Mv8v+jkJYi2I3Q4M1R2MDZli3nUk5FAqQmW2nql3WhNxOQf8NAAcNVUximqHytPWhb MIME-Version: 1.0 X-Received: by 10.107.128.26 with SMTP id b26mr5819141iod.87.1460494690874; Tue, 12 Apr 2016 13:58:10 -0700 (PDT) Received: by 10.107.178.146 with HTTP; Tue, 12 Apr 2016 13:58:10 -0700 (PDT) In-Reply-To: References: Date: Tue, 12 Apr 2016 13:58:10 -0700 Message-ID: Subject: Re: Control rate of preemption? From: Miles Crawford To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 In looking at the code I found two undocumented config properties: yarn.scheduler.fair.preemptionInterval yarn.scheduler.fair.waitTimeBeforeKill But these don't seem to enough for me, since it appears the fair scheduler will still preempt as many containers as it would like in a single operation. I was hoping for something like: yarn.scheduler.fair.maxContainersToPreemptPerInterval So that I could smooth out the rebalance operation over a longer time... -m On Mon, Apr 11, 2016 at 9:24 AM, Miles Crawford wrote: > > I'm using the YARN fair scheduler to allow a group of users to equally share > a cluster for running Spark jobs. > > Works great, but when a large rebalance happens, Spark sometimes can't keep > up, and the job fails. > > Is there any way to control the rate at which YARN preempts resources? I'd > love to limit the killing of containers to a slower pace, so Spark has a > chance to keep up. > > Thanks, > -miles --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org For additional commands, e-mail: user-help@hadoop.apache.org