Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B596196A0 for ; Mon, 11 Apr 2016 16:24:37 +0000 (UTC) Received: (qmail 41610 invoked by uid 500); 11 Apr 2016 16:24:33 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 41467 invoked by uid 500); 11 Apr 2016 16:24:33 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 41456 invoked by uid 99); 11 Apr 2016 16:24:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Apr 2016 16:24:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 306D81A0949 for ; Mon, 11 Apr 2016 16:24:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.298 X-Spam-Level: * X-Spam-Status: No, score=1.298 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=allenai-org.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 6Yb7te9dhVyj for ; Mon, 11 Apr 2016 16:24:30 +0000 (UTC) Received: from mail-io0-f180.google.com (mail-io0-f180.google.com [209.85.223.180]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 016B05FAEF for ; Mon, 11 Apr 2016 16:24:29 +0000 (UTC) Received: by mail-io0-f180.google.com with SMTP id g185so215713742ioa.2 for ; Mon, 11 Apr 2016 09:24:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=allenai-org.20150623.gappssmtp.com; s=20150623; h=mime-version:date:message-id:subject:from:to; bh=xsT7GWka/5ZtARCA/pxo92yudMmIP33MV8dI5HMzgIM=; b=lKeuMMYCTdqjThfNr+7ZJTaXlW31Qweso+Ui9FE8tT/oIiEjuiJHV3KTrtkfKN6Snq PrtL5SsegJIgNY+VwYDQPfxv4uaFAiUFDn0Sg9dmovVpHpfUR50pl8OeQ3h0WkZveHhs DzsCQbMYWfm/7TkpO1S949BsUSOWcz9yVGaPYKIuCD7a5I3ynmPi5o9ERKjUrONgyq2C aFOcNN24jE9qpWy+o5L3pe68EsngnJ+5rvqJGPqFiH+2+vCIj+XeLpfG3ww9tdGz+pwo fN5xf3mfFbXEKh+WMe+rtWLXTEreEitCsQgK+QW1a1fZQmnsY80lj4u07M33828SOBd6 fzoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=xsT7GWka/5ZtARCA/pxo92yudMmIP33MV8dI5HMzgIM=; b=mJtOv6dhDFkfSMuoW8SububEIM2KMl/jrJEFBX24lXRsAOyJQBH+RSZKB3awdCCUao VyiX6kvN6WsdFH8PyV8tr8CYhPZGwxFpQwDAtfEdFqpjOMEdpxk4sL4IHxoL/HNfLKgQ JJp1mleW0Pz2wxJR/6EEiURFZkME26TPLIzxHkyRnVhGE7l2uS5sAmkryueyWmfO6+aW jtQT5YtelXhsemEga4salFFR/7GVNaGhTravQ6VnvRy5f1CHbbQNRGKMtL7heZN+YoFP vXno5WVz23vZhfjTS0mHiOZEyyntCWwptuRFmzu/x0G1weTuXL/n309IeyzKRMgJU90z EaRA== X-Gm-Message-State: AD7BkJLxLthVm7SzUp7Y+62z6OEpdbxtngnRy5h/fnE7edVGK9Smx0yk5cOdQ0IrGDHqSzolAOg2wS6lFpw90fIn MIME-Version: 1.0 X-Received: by 10.107.136.102 with SMTP id k99mr26110148iod.88.1460391869053; Mon, 11 Apr 2016 09:24:29 -0700 (PDT) Received: by 10.107.178.146 with HTTP; Mon, 11 Apr 2016 09:24:28 -0700 (PDT) Date: Mon, 11 Apr 2016 09:24:28 -0700 Message-ID: Subject: Control rate of preemption? From: Miles Crawford To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a113eb9b296c079053037fab7 --001a113eb9b296c079053037fab7 Content-Type: text/plain; charset=UTF-8 I'm using the YARN fair scheduler to allow a group of users to equally share a cluster for running Spark jobs. Works great, but when a large rebalance happens, Spark sometimes can't keep up, and the job fails. Is there any way to control the rate at which YARN preempts resources? I'd love to limit the killing of containers to a slower pace, so Spark has a chance to keep up. Thanks, -miles --001a113eb9b296c079053037fab7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I'm using the YARN fair scheduler to al= low a group of users to equally share a cluster for running Spark jobs.
Works great, but when a large rebalance happens, Spark some= times can't keep up, and the job fails.

Is the= re any way to control the rate at which YARN preempts resources? I'd lo= ve to limit the killing of containers to a slower pace, so Spark has a chan= ce to keep up.

Thanks,
-miles
--001a113eb9b296c079053037fab7--