Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 90C6CF21F for ; Tue, 16 Apr 2013 09:11:21 +0000 (UTC) Received: (qmail 47145 invoked by uid 500); 16 Apr 2013 09:11:16 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 47005 invoked by uid 500); 16 Apr 2013 09:11:16 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 46964 invoked by uid 99); 16 Apr 2013 09:11:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 09:11:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bejoy.hadoop@gmail.com designates 209.85.216.170 as permitted sender) Received: from [209.85.216.170] (HELO mail-qc0-f170.google.com) (209.85.216.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 09:11:08 +0000 Received: by mail-qc0-f170.google.com with SMTP id d42so114923qca.15 for ; Tue, 16 Apr 2013 02:10:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=BlLCNYsiEc68gSfRCW5MHMtw3bq1PgAVgf8edr0Yfug=; b=flukhEfaRKjufx76xN9lObXIblijM6yk9XmY9NwQqSPIpaRUQe0v47ndN8BZYzPtrJ uhuSzIlzfgt9EjFbridlGCoDAZ86U0RNd64OZynMbAlyvqBYTrzUnc9XfcfGeKNWS+f9 YpuyXc7fV+Sk0N6siKUYg9d5jJXkVM7ZdEsnHCMKqhzfKZzGfhfW/jA3lcfAWtQLMpkg 0L/lGZWq35lB8H/3c7kOQvUvAgAyAVGjImd6zmVvcz6m8zrs+61U9Tb/cXbvdCuOrFCK ZfGAagYm8tBtgpOQVzIvGJK2SNCLl1CD/XT8nUyOxdlCFPCPMCmg75peDZXfQ7lO6JOW FM9g== MIME-Version: 1.0 X-Received: by 10.229.138.129 with SMTP id a1mr515897qcu.120.1366103447700; Tue, 16 Apr 2013 02:10:47 -0700 (PDT) Received: by 10.49.75.7 with HTTP; Tue, 16 Apr 2013 02:10:47 -0700 (PDT) In-Reply-To: References: Date: Tue, 16 Apr 2013 14:40:47 +0530 Message-ID: Subject: Re: VM reuse! From: Bejoy Ks To: "mapreduce-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=e89a8f502904ba09eb04da76be50 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f502904ba09eb04da76be50 Content-Type: text/plain; charset=ISO-8859-1 Hi Rahul If you look at larger cluster and jobs that involve larger input data sets. The data would be spread across the whole cluster, and a single node might have various blocks of that entire data set. Imagine you have a cluster with 100 map slots and your job has 500 map tasks, now in that case there should be multiple map tasks in a single task tracker based on slot availability. Here if you enable jvm reuse, all tasks related to a job on a single TaskTracker would use the same jvm. The benefit here is just the time you are saving in spawning and cleaning up jvm for individual tasks. On Tue, Apr 16, 2013 at 2:04 PM, Rahul Bhattacharjee < rahul.rec.dgp@gmail.com> wrote: > Hi, > > I have a question related to VM reuse in Hadoop.I now understand the > purpose of VM reuse , but I am wondering how is it useful. > > Example. for VM reuse to be effective or kicked in , we need more than one > mapper task to be submitted to a single node (for the same job).Hadoop > would consider spawning mappers into nodes which actually contains the data > , it might rarely happen that multiple mappers are allocated to a single > task tracker. And even if a single task nodes gets to run multiple mappers > then it might as well run in parallel in multiple VM rather than > sequentially in a single VM. > > I am sure I am missing some link here , please help me find that. > > Thanks, > Rahul > --e89a8f502904ba09eb04da76be50 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Rahul

If you look at larger clus= ter and jobs that involve larger input data sets. The data would be spread = across the whole cluster, and a single node might have=A0 various blocks of= that entire data set. Imagine you have a cluster with 100 map slots and yo= ur job has 500 map tasks, now in that case there should be multiple map tas= ks in a single task tracker based on slot availability.

Here if you enable jvm reuse, all tasks related to a job on a sin= gle TaskTracker would use the same jvm. The benefit here is just the time y= ou are saving in spawning and cleaning up jvm for individual tasks.




On Tue, Apr 16, 2013 at 2:04 PM, Rahul Bhattacharjee <= ;rahul.rec.dgp= @gmail.com> wrote:
Hi,

I have a question related to VM reuse in Hadoop.I now understand the purpos= e of VM reuse , but I am wondering how is it useful.

Example. for VM reuse to be effective or kicked in , we need more t= han one mapper task to be submitted to a single node (for the same job).Had= oop would consider spawning mappers into nodes which actually contains the = data , it might rarely happen that multiple mappers are allocated to a sing= le task tracker. And even if a single task nodes gets to run multiple mappe= rs then it might as well run in parallel in multiple VM rather than sequent= ially in a single VM.

I am sure I am missing some link here , please help me find that.
Thanks,
Rahul

--e89a8f502904ba09eb04da76be50--