Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 44CA41015C for ; Mon, 27 Jan 2014 11:58:16 +0000 (UTC) Received: (qmail 17310 invoked by uid 500); 27 Jan 2014 11:58:07 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 17192 invoked by uid 500); 27 Jan 2014 11:58:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 17185 invoked by uid 99); 27 Jan 2014 11:58:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2014 11:58:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sudhakara.st@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-ob0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2014 11:58:00 +0000 Received: by mail-ob0-f172.google.com with SMTP id vb8so6383659obc.3 for ; Mon, 27 Jan 2014 03:57:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=F7K6KKiugYdVkDJcqx4P8M/yzNh7heodbz4r+PFmrdg=; b=GVB+wDykS2CgLbv0bAtKmM7gDS5BLKpFi3mnoRN+dPfqVbrM1Hgvmq+glipZEnP8Vg 8xdu4Rldp42Wmds+6shQObjmnEZf6NidxdvKteNpEWP4XWZN2E5KkoGqnuhqUJt97s8k z2BxtBzIRCUAnSXEaAybs8h3Go8LJP4JaCXUfnihf2m9L9ks2wKgfP43XvAMB3Ng6q1D Z+Kcg59aPTbnP2NtFBHP8wGKe+VK32wCaW/NedtuHBY/jV3ID6zy4Cd0ntJQtYgYw7Dp x6xs3zlWUn82ZKgIzOsKr56GAtm5eJLrRkkex4XW/F9lRJM1xoKgmRXjIejwWF5w3eFc BzCw== MIME-Version: 1.0 X-Received: by 10.182.142.5 with SMTP id rs5mr5003329obb.39.1390823859578; Mon, 27 Jan 2014 03:57:39 -0800 (PST) Received: by 10.76.188.195 with HTTP; Mon, 27 Jan 2014 03:57:39 -0800 (PST) In-Reply-To: <20140127053904.7426edd2@gmail.com> References: <20140127053904.7426edd2@gmail.com> Date: Mon, 27 Jan 2014 17:27:39 +0530 Message-ID: Subject: Re: Performance in running jobs at the same time From: sudhakara st To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c2d3b6186d6604f0f26a0c X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2d3b6186d6604f0f26a0c Content-Type: text/plain; charset=ISO-8859-1 1 - I installed Hadoop MRv2 in VirtualMachines. When the jobs are running, I try to list them with "hadoop jobs -list", but it takes lots of time for the command being executed. This happens because of the performance of the VM. I just wonder how it works with big machines. Does anyone have an idea if it takes long to launch Hadoop commands while executing jobs. *>> Get job information involves communication with resource mager/application Master. Because of available resource(CPU,Memory) in your VM is too less. may hadoop command taking long time to get job information.*2 - I want to run several jobs at the same time. How can I configure the maximum number of jobs that I can run at the same time? *>> Once you submit you job to RM, scheduler will decide how to run your job based on scheduler you used to run jobs and resource availability in your cluster. you have to write or customize scheduler to control the submission order or number of jobs to run at any instance. * 3 - Is there a calculation of how many jobs I can run at the same time for specific environment similar to how many reduces should we set in our jobs? *>> If you have clear idea about how much of data your going process in your jobs, how much of resource it going to use, how much of total resource available in cluster then you can define how many jobs can run at instance of time. It possible when are going handle only fixed data set in all cycles, in real environment it not possible calculate these thing for each job in each run.** In hadoop2 RM takes care all resource mangemnt, you need not to take special care about all these things. if need ordere process of jobs then you look no Oozie kind of tool to control over order of MR jobs.* On Mon, Jan 27, 2014 at 11:09 AM, xeon wrote: > Hi, > > 1 - I installed Hadoop MRv2 in VirtualMachines. When the jobs are > running, I try to list them with "hadoop jobs -list", but it takes lots > of time for the command being executed. This happens because of the > performance of the VM. I just wonder how it works with big machines. > Does anyone have an idea if it takes long to launch Hadoop commands > while executing jobs? > > > 2 - I want to run several jobs at the same time. How can I configure > the maximum number of jobs that I can run at the same time? > > > 3 - Is there a calculation of how many jobs I can run at the same time > for specific environment similar to how many reduces should we set in > our jobs? > > Thanks, > > -- > Best regards, > -- Regards, ...Sudhakara.st --001a11c2d3b6186d6604f0f26a0c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

1 - I installed Hadoop MRv2 in VirtualMachines. When the jobs are
running, I try to list them with "hadoop jobs -list", but it take= s lots
of time for the command being executed. This happens because of the
performance of the VM. I just wonder how it works with big machines.
Does anyone have an idea if it takes long to launch Hadoop commands
while executing jobs.

>> Get job information in= volves=A0 communication with resource mager/application Master. Because of = available resource(CPU,Memory) in your VM is too less. may hadoop command t= aking long time to get job information.

2 - I want to run several jobs at the same time. How can I c= onfigure
the maximum number of jobs that I can run at the same time?

>> Once you submit you job to R= M, scheduler will decide how to run your job based on scheduler you used to= run jobs and resource availability in your cluster. you have to write or c= ustomize scheduler to control the submission order or number of jobs to run= at any instance.

3 - Is there a calculation of how many jobs I can run at the same time<= br> for specific environment similar to how many reduces should we set in
our jobs?

>> If you=A0 have clear idea about ho= w much of data your going process in your jobs, how much of resource it goi= ng to use, how much of total resource available in cluster then you can def= ine how many jobs can run at instance of time. It possible when are going h= andle only fixed data set in all cycles, in=A0 real environment it not poss= ible calculate=A0 these thing for each job in each run. In hadoop2 RM takes care all resource mange= mnt, you need not to take special care about all these things. if need orde= re process of jobs then you look no Ooz= ie kind of tool to control over order of MR jobs.
--001a11c2d3b6186d6604f0f26a0c--