From user-return-5505-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Tue Feb 26 14:03:36 2013 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5EA85E293 for ; Tue, 26 Feb 2013 14:03:36 +0000 (UTC) Received: (qmail 12076 invoked by uid 500); 26 Feb 2013 14:03:31 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 11790 invoked by uid 500); 26 Feb 2013 14:03:30 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 11773 invoked by uid 99); 26 Feb 2013 14:03:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 14:03:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of acm@hortonworks.com designates 209.85.220.52 as permitted sender) Received: from [209.85.220.52] (HELO mail-pa0-f52.google.com) (209.85.220.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 14:03:23 +0000 Received: by mail-pa0-f52.google.com with SMTP id fb1so2454719pad.25 for ; Tue, 26 Feb 2013 06:03:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:mime-version:content-type:subject:date:in-reply-to :to:references:message-id:x-mailer:x-gm-message-state; bh=VhwcVt7fgkXveKGOSMaGQau3WjiDbkXVHBGcKjBelXQ=; b=SRoVQKT0HKVq126dj9xv3T5hYLtFPBsvs/JzVr9d3qp8Kayq52iA7iRdJrGbMamCcQ T0f4EO/gzLFFeZ5nasx/kJ2qCiSZOAwIYiagQSv5B0iuZFMHMKrxuH+zX+iemVezJbQw 0aas9x4+oyKgA/QUfdNxDRQ3OLwmS9oAM3wAqWBRS6deHqvL2m7tbXXeqofgq4aIqmYW rlmBncq/G/3D6gSTKnBuUfjReP5lvxuq6cvvbU5n6CTnuWFNFGDD891mEdUBhNeS0frF +h4iMhTC6sRQeaqu+mtH18DBcu8CorWFj/HbcnRgg4Bp+v2GsTT9o9Nz3uMutkGCMf2Z DkoA== X-Received: by 10.68.231.70 with SMTP id te6mr23430825pbc.159.1361887383011; Tue, 26 Feb 2013 06:03:03 -0800 (PST) Received: from [10.0.1.20] (c-98-234-189-94.hsd1.ca.comcast.net. [98.234.189.94]) by mx.google.com with ESMTPS id z6sm1557529paw.20.2013.02.26.06.03.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 26 Feb 2013 06:03:01 -0800 (PST) From: Arun C Murthy Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-445--819396376 Subject: Re: Hadoop efficient resource isolation Date: Tue, 26 Feb 2013 06:02:56 -0800 In-Reply-To: <4839F0AF-25D4-4141-87B7-195CE6BB89B6@hooklogic.com> To: user@hadoop.apache.org References: ,<2AF7313F-73BC-44B2-9E3F-D9183F9D0BAB@hortonworks.com> <4839F0AF-25D4-4141-87B7-195CE6BB89B6@hooklogic.com> Message-Id: X-Mailer: Apple Mail (2.1084) X-Gm-Message-State: ALoCoQmGv8CoO+nwUXKftTeaTL7zT4pFqxh23WZdGld8qCiYBFHyqD6X2trdrDtmPN1bHUocIlY6 X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-445--819396376 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii CapacityScheduler has features to allow a user to specify the amount of = virtual memory per map/reduce task and the TaskTracker monitors all = tasks and their process-trees to ensure fork-bombs don't kill the node. On Feb 25, 2013, at 8:27 PM, Marcin Mejran wrote: > That won't stop a bad job (say a fork bomb or a massive memory leak in = a streaming script) from taking out a node which is what I believe = Dhanasekaran was asking about. He wants to physically isolate certain = lobs to certain "non critical" nodes. I don't believe this is possible = and data would be spread to those nodes, assuming they're data nodes, = which would still cause cluster wide issues (and if data is isolate why = not have two separate clusters?), >=20 > I've read references in the docs about some type of memory based = contrains in Hadoop but I don't know of the details. Anyone know how = they work? >=20 > Also, I believe there are tools in Linux that can kill processes in = case of memory issues and otherwise restrict what a certain user can do. = These seem like a more flexible solution although they won't cover all = potential issues. >=20 > -Marcin >=20 > On Feb 25, 2013, at 7:20 PM, "Arun C Murthy" = wrote: >=20 >> CapacityScheduler is what you want... >>=20 >> On Feb 21, 2013, at 5:16 AM, Dhanasekaran Anbalagan wrote: >>=20 >>> Hi Guys, >>>=20 >>> It's possible isolation job submission for hadoop cluster, we = currently running 48 machine cluster. we monitor Hadoop is not provides = efficient resource isolation. In my case we ran for tech and research = pool, When tech job some memory leak will haven, It's occupy the hole = cluster. Finally we figure out issue with tech job. It's screwed up = hole hadoop cluster. finally 10 data node are dead. >>>=20 >>> Any prevention of job submission efficient way resource allocation. = When something wrong in particular job, effect particular pool, Not = effect others job. Any way to archive this >>>=20 >>> Please guide me guys. >>>=20 >>> My idea is, When tech user submit job means only apply job in for my = case submit 24 machine. other machine only for research user. >>>=20 >>> It's will prevent the memory leak problem.=20 >>> =20 >>>=20 >>> -Dhanasekaran. >>> Did I learn something today? If not, I wasted it. >>=20 >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >>=20 >>=20 -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ --Apple-Mail-445--819396376 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
That won't stop a bad job (say a fork bomb or a massive memory leak = in a streaming script) from taking out a node which is what I = believe 

I've = read references in the docs about some type of memory based contrains in Hadoop but I don't know of = the details. Anyone know how they work?

Also, I = believe there are tools in Linux that can kill processes in case of memory issues and otherwise = restrict what a certain user can do. These seem like a more flexible = solution although they won't cover all potential issues.

-Marcin

On Feb 25, 2013, at 7:20 PM, "Arun C Murthy" <acm@hortonworks.com> = wrote:

CapacityScheduler is what you want...

On Feb 21, 2013, at 5:16 AM, Dhanasekaran Anbalagan wrote:

Hi Guys,

It's possible isolation job submission for hadoop cluster, we = currently running 48 machine cluster. we  monitor Hadoop is not = provides efficient resource isolation. In my case we ran for tech and = research pool, When tech job some memory leak will haven, It's occupy the hole cluster.  Finally we figure = out  issue with tech job. It's  screwed up hole hadoop = cluster. finally 10 data node  are dead.

Any prevention of job submission efficient way resource allocation. = When something wrong in   particular job, effect particular = pool, Not effect others job. Any way to archive this

Please guide me guys.

My idea is, When tech user submit job means only apply = job in for my case submit 24 machine. other machine only for = research user.

It's will prevent the memory leak problem. 
 

-Dhanasekaran.
Did I learn something today? If not, I wasted it.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

=

= --Apple-Mail-445--819396376--