Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A0A619919 for ; Wed, 30 Mar 2016 21:17:43 +0000 (UTC) Received: (qmail 97372 invoked by uid 500); 30 Mar 2016 21:17:39 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 97216 invoked by uid 500); 30 Mar 2016 21:17:39 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 97205 invoked by uid 99); 30 Mar 2016 21:17:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Mar 2016 21:17:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6A7C81A030C for ; Wed, 30 Mar 2016 21:17:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.697 X-Spam-Level: * X-Spam-Status: No, score=1.697 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLYTO_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 1PNQCFN_EpbZ for ; Wed, 30 Mar 2016 21:17:37 +0000 (UTC) Received: from nm19-vm4.bullet.mail.gq1.yahoo.com (nm19-vm4.bullet.mail.gq1.yahoo.com [98.136.217.27]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 091835F249 for ; Wed, 30 Mar 2016 21:17:35 +0000 (UTC) Received: from [216.39.60.183] by nm19.bullet.mail.gq1.yahoo.com with NNFMP; 30 Mar 2016 21:17:28 -0000 Received: from [98.137.12.245] by tm19.bullet.mail.gq1.yahoo.com with NNFMP; 30 Mar 2016 21:17:28 -0000 Received: from [127.0.0.1] by omp1053.mail.gq1.yahoo.com with NNFMP; 30 Mar 2016 21:17:28 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 165320.90399.bm@omp1053.mail.gq1.yahoo.com X-YMail-OSG: olfY8UEVM1lpRqb91IoVXouqu13TvKmd5zyTUOuCDfTCs9OjNgNjsY8_gNv1XO6 4COVmCOEX3drLKANjrDKYCjj5hjCJNQbxrD.7g4PFIAftumTTdOh4RE6T7.TSGWLW6aAq1S63Olq ZSI5NlGLFDpSq.1fffm0i47UVGeWGVXyyeC.ew2H4sY0S3PFfU3iCXORAnR4SahWeYW9LzJIyPiN 879mh0cX7sEjNxPrUazj35CCXsr_5rxKTn6eRQoyBgUHP.KJt.tZnA_QbPxiPJod3HNKo3xJx772 wKOO5kWyh5pYfOKqALiF0dCO4qGeA19wd0cIxLiIgoGSq.IdjpPYiRpbegvoWx57Gfjre2_wAJzP MhXjf7uGgJiDoKvrN.p1ivpGXM8MBP4KrShIGYywafRHF50yJ2dcTQ6PgChAMef9S19itHahxRGS 55G9WKX8RVW4EWNm3bRRjgPALGWoXS7oFk95UZX.z5RkFxTHyny0zQ3gX9bYjuOTbZePYqilC_kg aArP1liy1p8AKjjSkL8ECXrZRBBIE Received: by 216.39.60.200; Wed, 30 Mar 2016 21:17:27 +0000 Date: Wed, 30 Mar 2016 21:17:27 +0000 (UTC) From: Eric Payne Reply-To: Eric Payne To: =?UTF-8?Q?Zolt=C3=A1n_Zvara?= , Vinod Kumar Vavilapalli Cc: "user@hadoop.apache.org" Message-ID: <1825610103.261101.1459372647274.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: Subject: Re: YARN re-locate container MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_261100_1864562132.1459372647270" ------=_Part_261100_1864562132.1459372647270 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I think it would help if I knew what the criteria is for wanting to move th= e container. In other words, was the container started on an undesirable no= de in the first place? Or, did the node become undesirable after the contai= ner started. Speculation could be considered a "move" operation for containers. If a con= tainer isn't finishing fast enough, the default speculator will start anoth= er container on a different node. Would it be possible to create a speciali= zed speculator that understood your criteria for needing to move containers= ? If so, it could be done automatically / programatically. Thanks,-Eric From: Zolt=C3=A1n Zvara To: Vinod Kumar Vavilapalli =20 Cc: user@hadoop.apache.org Sent: Wednesday, March 30, 2016 9:10 AM Subject: Re: YARN re-locate container =20 How is this achieved? As far as I see it now, after stopping a container, t= he AM must reallocate the same container with the same resource vector but = with locality preferences pointed to the new, target node. After the new le= ash has been acquired, then the AM can take it to the new node and initiate= a `startContainers` message.Our use-case with Ericsson would require a mor= e simple API, where (for example) a `moveContainer` call from the AM would = ask the RM or NM to move a container from one node to another (or to any of= the specified set of preferred nodes). Move would simply kill the containe= r and restart it on another node at any given time whenever it is possible = - I feel questions around scheduling: how container moves should be handled= ? Probably not like simple allocations. Am I understanding the architecture correctly here? On Tue, Mar 29, 2016 at 7:31 PM Vinod Kumar Vavilapalli wrote: Containers can be restarted on other machines already today - YARN just lea= ves it up to the applications to do so. Are you looking for anything more specifically? +Vinod > On Mar 29, 2016, at 9:45 AM, Zolt=C3=A1n Zvara w= rote: > > Dear Hadoop Community, > > Is there any feature available, or on the road map to support the relocat= ion of containers? (Simply restart the container on another machine.) > > Thanks, > Zolt=C3=A1n ------=_Part_261100_1864562132.1459372647270 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I think it would help if I knew what the criteria is f= or wanting to move the container. In other words, was the container started= on an undesirable node in the first place? Or, did the node become undesir= able after the container started.

Speculation could be considered = a "move" operation for containers. If a container isn't finishing fast enou= gh, the default speculator will start another container on a different node= . Would it be possible to create a specialized speculator that understood y= our criteria for needing to move containers? If so, it could be done automa= tically / programatically.

Thanks,
-Eric



From:= Zolt=C3=A1n Zvara <zoltan.zvara@gmail.com>
To: Vinod Kumar Vavilapalli <vino= dkv@apache.org>
Cc:= user@hadoop.apache.org
Sent: Wednesday, March 30, 2016 9:10 AM
Subject: Re: YARN re-locate container

How i= s this achieved? As far as I see it now, after stopping a container, the AM= must reallocate the same container with the same resource vector but with = locality preferences pointed to the new, target node. After the new leash h= as been acquired, then the AM can take it to the new node and initiate a `s= tartContainers` message.
O= ur use-case with Ericsson would require a more simple API, where (for examp= le) a `moveContainer` call from the AM would ask the RM or NM to move a con= tainer from one node to another (or to any of the specified set of preferre= d nodes). Move would simply kill the container and restart it on another no= de at any given time whenever it is possible - I feel questions around sche= duling: how container moves should be handled? Probably not like simple all= ocations.

Am I unde= rstanding the architecture correctly here?

On Tue, Mar 29, 2016 a= t 7:31 PM Vinod Kumar Vavilapalli <vinodkv@apache.org> wrote:
Containers can be restarted on ot= her machines already today - YARN just leaves it up to the applications to = do so.

Are you looking for anything more specifically?

+Vinod

> On Mar 29, 2016, at 9:45 AM, Zolt=C3=A1n Zvara <zoltan.zvara@gmail.com> wrot= e:
>
> Dear Hadoop Community,
>
> Is there any feature available, or on the road map to support the relo= cation of containers? (Simply restart the container on another machine.) >
> Thanks,
> Zolt=C3=A1n



------=_Part_261100_1864562132.1459372647270--