Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 632CAE06B for ; Tue, 26 Feb 2013 04:28:25 +0000 (UTC) Received: (qmail 72989 invoked by uid 500); 26 Feb 2013 04:28:19 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 72891 invoked by uid 500); 26 Feb 2013 04:28:19 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 72864 invoked by uid 99); 26 Feb 2013 04:28:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 04:28:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of marcin.mejran@hooklogic.com designates 213.199.154.144 as permitted sender) Received: from [213.199.154.144] (HELO db3outboundpool.messaging.microsoft.com) (213.199.154.144) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 04:28:09 +0000 Received: from mail2-db3-R.bigfish.com (10.3.81.250) by DB3EHSOBE007.bigfish.com (10.3.84.27) with Microsoft SMTP Server id 14.1.225.23; Tue, 26 Feb 2013 04:27:47 +0000 Received: from mail2-db3 (localhost [127.0.0.1]) by mail2-db3-R.bigfish.com (Postfix) with ESMTP id DF0FF44019A for ; Tue, 26 Feb 2013 04:27:46 +0000 (UTC) X-Forefront-Antispam-Report: CIP:132.245.2.21;KIP:(null);UIP:(null);IPV:NLI;H:BN1PRD0512HT003.namprd05.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: -1 X-BigFish: PS-1(zz98dI9371Ic85fhzz1f42h1ee6h1de0h1202h1e76h1d1ah1d2ahz31iz177df4h17326ah18c673h8275bhz2fh2a8h668h839hbe3hd25he5bhf0ah1288h12a5h12bdh137ah1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1155h) Received-SPF: pass (mail2-db3: domain of hooklogic.com designates 132.245.2.21 as permitted sender) client-ip=132.245.2.21; envelope-from=marcin.mejran@hooklogic.com; helo=BN1PRD0512HT003.namprd05.prod.outlook.com ;.outlook.com ; Received: from mail2-db3 (localhost.localdomain [127.0.0.1]) by mail2-db3 (MessageSwitch) id 1361852863532989_26485; Tue, 26 Feb 2013 04:27:43 +0000 (UTC) Received: from DB3EHSMHS005.bigfish.com (unknown [10.3.81.234]) by mail2-db3.bigfish.com (Postfix) with ESMTP id 760DE2C00FB for ; Tue, 26 Feb 2013 04:27:43 +0000 (UTC) Received: from BN1PRD0512HT003.namprd05.prod.outlook.com (132.245.2.21) by DB3EHSMHS005.bigfish.com (10.3.87.105) with Microsoft SMTP Server (TLS) id 14.1.225.23; Tue, 26 Feb 2013 04:27:43 +0000 Received: from BN1PRD0512MB602.namprd05.prod.outlook.com ([169.254.13.144]) by BN1PRD0512HT003.namprd05.prod.outlook.com ([10.255.193.36]) with mapi id 14.16.0263.000; Tue, 26 Feb 2013 04:27:35 +0000 From: Marcin Mejran To: "" CC: "user@hadoop.apache.org" Subject: Re: Hadoop efficient resource isolation Thread-Topic: Hadoop efficient resource isolation Thread-Index: AQHOEDXE3CkT2Z0wSkuN/45NNaiJdpiLTXcAgABFUSY= Date: Tue, 26 Feb 2013 04:27:35 +0000 Message-ID: <4839F0AF-25D4-4141-87B7-195CE6BB89B6@hooklogic.com> References: ,<2AF7313F-73BC-44B2-9E3F-D9183F9D0BAB@hortonworks.com> In-Reply-To: <2AF7313F-73BC-44B2-9E3F-D9183F9D0BAB@hortonworks.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [70.192.66.122] Content-Type: multipart/alternative; boundary="_000_4839F0AF25D4414187B7195CE6BB89B6hooklogiccom_" MIME-Version: 1.0 X-OriginatorOrg: hooklogic.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_4839F0AF25D4414187B7195CE6BB89B6hooklogiccom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable That won't stop a bad job (say a fork bomb or a massive memory leak in a st= reaming script) from taking out a node which is what I believe Dhanasekaran= was asking about. He wants to physically isolate certain lobs to certain "= non critical" nodes. I don't believe this is possible and data would be spr= ead to those nodes, assuming they're data nodes, which would still cause cl= uster wide issues (and if data is isolate why not have two separate cluster= s?), I've read references in the docs about some type of memory based contrains = in Hadoop but I don't know of the details. Anyone know how they work? Also, I believe there are tools in Linux that can kill processes in case of= memory issues and otherwise restrict what a certain user can do. These see= m like a more flexible solution although they won't cover all potential iss= ues. -Marcin On Feb 25, 2013, at 7:20 PM, "Arun C Murthy" > wrote: CapacityScheduler is what you want... On Feb 21, 2013, at 5:16 AM, Dhanasekaran Anbalagan wrote: Hi Guys, It's possible isolation job submission for hadoop cluster, we currently run= ning 48 machine cluster. we monitor Hadoop is not provides efficient resou= rce isolation. In my case we ran for tech and research pool, When tech job = some memory leak will haven, It's occupy the hole cluster. Finally we figu= re out issue with tech job. It's screwed up hole hadoop cluster. finally = 10 data node are dead. Any prevention of job submission efficient way resource allocation. When so= mething wrong in particular job, effect particular pool, Not effect other= s job. Any way to archive this Please guide me guys. My idea is, When tech user submit job means only apply job in for my case s= ubmit 24 machine. other machine only for research user. It's will prevent the memory leak problem. -Dhanasekaran. Did I learn something today? If not, I wasted it. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ --_000_4839F0AF25D4414187B7195CE6BB89B6hooklogiccom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
That won't stop a bad job (say a fork bomb or a massive memory leak in= a streaming script) from taking out a node which is what I believe = Dhanasekaran was asking about. He wants to physically isolate certain lobs to certain &= quot;non critical" nodes. I don't believe this is possible and data wo= uld be spread to those nodes, assuming they're data nodes, which would stil= l cause cluster wide issues (and if data is isolate why not have two separate clusters?),

I've read references in the docs about some type of memory based contrains in Hadoop but I don't know of th= e details. Anyone know how they work?

Also, I believe there are tools in Linux that can kill processes in case of memory issues and otherwise re= strict what a certain user can do. These seem like a more flexible solution= although they won't cover all potential issues.

-Marcin

On Feb 25, 2013, at 7:20 PM, "Arun C Murthy" <acm@hortonworks.com> wrote:

CapacityScheduler is what you want...

On Feb 21, 2013, at 5:16 AM, Dhanasekaran Anbalagan wrote:

Hi Guys,

It's possible isolation job submission for hadoop cluster, we cur= rently running 48 machine cluster. we  monitor Hadoop is not provides = efficient resource isolation. In my case we ran for tech and research pool,= When tech job some memory leak will haven, It's occupy the hole cluster.  Finally we figure out&n= bsp; issue with tech job. It's  screwed up hole hadoop cluster. finall= y 10 data node  are dead.

Any prevention of job submission efficient way resource allocation. Wh= en something wrong in   particular job, effect particular pool, N= ot effect others job. Any way to archive this

Please guide me guys.

My idea is, When tech user submit job means only apply job = in for my case submit 24 machine. other machine only for research user= .

It's will prevent the memory leak problem. 
 

-Dhanasekaran.
Did I learn something today? If not, I wasted it.

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


--_000_4839F0AF25D4414187B7195CE6BB89B6hooklogiccom_--