Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2F22FD9CD for ; Tue, 30 Oct 2012 17:51:34 +0000 (UTC) Received: (qmail 74308 invoked by uid 500); 30 Oct 2012 17:51:29 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 74193 invoked by uid 500); 30 Oct 2012 17:51:29 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 74185 invoked by uid 99); 30 Oct 2012 17:51:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2012 17:51:29 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2012 17:51:24 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so922200iea.35 for ; Tue, 30 Oct 2012 10:51:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=oaWRJ8RIu7AH1aWkmHxtGTP0eMK7Dkftq1p1Bkm9tXM=; b=BujeqirfFRv8LiOn8LurCUrLJBt80SIn6pisHXiH81JS2b2XETxE7yGYOhAdLCEvo4 H3IqNbSFAdw2zMVl6+0Z8h915WlI8+hlBz0oc+rnzx2dASUcjHdib4ltIhm6eVQMWojU /48ZYHpjJrAdizfThCyLTbJbmvDm/L37IqFyiLZvzt0ftTpUTmyRCTvcVEYx/HksttwE swzb1hgnlzEgzrA5UwQCn0bQkchqd+wO6K9a6wV1wtCgatnWYa5HivW8wc+GHMx0/PVe AzkBxdbOgvU3Cd7gKgKWucJSzDHXOkI0/GbounlCF8ItO/i+KGMDXmbkqghPDQIwoBx6 WQIQ== Received: by 10.42.21.68 with SMTP id j4mr10747541icb.18.1351619463594; Tue, 30 Oct 2012 10:51:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.27.8 with HTTP; Tue, 30 Oct 2012 10:50:43 -0700 (PDT) In-Reply-To: <6FC1FBD9-E632-47AB-AA5C-D632C27FC22D@hortonworks.com> References: <6FC1FBD9-E632-47AB-AA5C-D632C27FC22D@hortonworks.com> From: Harsh J Date: Tue, 30 Oct 2012 23:20:43 +0530 Message-ID: Subject: Re: Memory based scheduling To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlHRpoQWqiwNhNc63GY6MwwF09jaGmFGtUPJq5BJ1bKXDUl0L150IInYWs0VsqCr6Dw7KUy X-Virus-Checked: Checked by ClamAV on apache.org Arun's correct, on the 0.20.x and 1.x line, the CS allows "slot reservation" based on provided memory requests from each job. You'd have to set your cluster's maximum allowed per-task memory request [0], define the per "slot" maximum memory unit [1], then set on a per-job basis the real memory request you may need for it [2]. While the framework itself allows for memory usage monitoring in general, CS further allows "slot management" based on the requested resources, so if you requested a 4 GB map task memory resource [3] on a cluster slot definition of 2 GB [2], two slots get reserved to run such a task JVM. Arun's link has more info on setting up the whole CS. Btw, you may also want https://issues.apache.org/jira/browse/MAPREDUCE-4001 and https://issues.apache.org/jira/browse/MAPREDUCE-3789 in your Hadoop release/distribution since your environment is heterogenous, and the 1.x/0.20.x CS without these fixes applied might end up wasting some cluster resources unnecessarily. I'd also recommend looking at YARN, which is driven purely based on resource requests (memory currently, but soon cpu and others). [0] - mapred.cluster.max.map.memory.mb and mapred.cluster.max.reduce.memory= .mb [1] - mapred.cluster.map.memory.mb and mapred.cluster.reduce.memory.mb [2] - mapred.job.map.memory.mb and mapred.job.reduce.memory.mb On Tue, Oct 30, 2012 at 10:54 PM, Arun C Murthy wrote= : > Not true, take a look at my prev. response. > > On Oct 30, 2012, at 9:08 AM, lohit wrote: > > As far as I recall this is not possible. Per job or per user configuratio= ns > like these are little difficult in existing version. > What you could try is to set max map per job to be say half of cluster > capacity. (This is possible with FairSchedule, I do not know of > CapacityScheduler) > For eg, if you have 10 nodes with 4 slots each. You would create pool and > set max maps to be 20. > JobTracker will try its best to spread tasks across nodes provided they a= re > empty slots. But again, this is not guaranteed. > > > 2012/10/30 Marco Z=FChlke >> >> Hi, >> >> on our cluster our jobs usually satisfied with less than 2 GB of heap >> space. >> so we have on our 8 GB computers 3 maps maximum and on our 16 GB >> computers 4 maps maximum (we only have quad core CPUs and to have >> memory left for reducers). This works very well. >> >> But now we have a new kind of jobs. Each mapper requires at lest 4 GB >> of heap space. >> >> Is it possible to limit the number of tasks (mapper) per computer to 1 o= r >> 2 for >> these kinds of jobs ? >> >> Regards, >> Marco >> > > > > -- > Have a Nice Day! > Lohit > > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > --=20 Harsh J