Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 96C6CD313 for ; Fri, 19 Oct 2012 04:13:25 +0000 (UTC) Received: (qmail 83269 invoked by uid 500); 19 Oct 2012 04:13:21 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 82954 invoked by uid 500); 19 Oct 2012 04:13:20 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 82924 invoked by uid 99); 19 Oct 2012 04:13:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Oct 2012 04:13:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of xieliang@xiaomi.com designates 58.68.235.87 as permitted sender) Received: from [58.68.235.87] (HELO mx1.mxmail.xiaomi.com) (58.68.235.87) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Oct 2012 04:13:11 +0000 Received: from mail.xiaomi.com (unknown [59.108.40.219]) by mx1.mxmail.xiaomi.com (Postfix) with ESMTP id 4968580EA7 for ; Fri, 19 Oct 2012 12:12:49 +0800 (CST) Received: from EX-MBOX2.xiaomi.net ([fe80::3d39:a0dd:9859:26a6]) by EX-CAS1.xiaomi.net ([::1]) with mapi id 14.02.0247.003; Fri, 19 Oct 2012 12:10:49 +0800 From: =?gb2312?B?0LvBvA==?= To: "user@hadoop.apache.org" Subject: =?gb2312?B?tPC4tDogT09NL2NyYXNoZXMgZHVlIHRvIHByb2Nlc3MgbnVtYmVyIGxpbWl0?= Thread-Topic: OOM/crashes due to process number limit Thread-Index: AQHNrTwoaOd2PusfhUGObqscMOnCLZfABLqZ Date: Fri, 19 Oct 2012 04:10:48 +0000 Message-ID: References: In-Reply-To: Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.237.101.10] Content-Type: multipart/alternative; boundary="_000_DA8340397F7BAE41B8102757834CEB120F530F98EXMBOX2xiaomine_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_DA8340397F7BAE41B8102757834CEB120F530F98EXMBOX2xiaomine_ Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: base64 d2hhdCdzIHRoZSBleGFjdGx5IE9PTSBlcnJvciBtZXNzYWdlLCBpcyBpdCBzdGggbGlrZSAiT3V0 T2ZNZW1vcnlFcnJvcjogdW5hYmxlIHRvIGNyZWF0ZSBuZXcgbmF0aXZlIHRocmVhZCIgPw0KX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCreivP7IyzogQWlkZW4gQmVsbCBbYWlkZW40 NDlAZ21haWwuY29tXQ0Kt6LLzcqxvOQ6IDIwMTLE6jEw1MIxOMjVIDIyOjI0DQrK1bz+yMs6IHVz ZXJAaGFkb29wLmFwYWNoZS5vcmcNCtb3zOI6IE9PTS9jcmFzaGVzIGR1ZSB0byBwcm9jZXNzIG51 bWJlciBsaW1pdA0KDQpIaSBBbGwsDQoNCkltIHJ1bm5pbmcgcXVpdGUgYSBiYXNpYyBtYXAvcmVk dWNlIGpvYiB3aXRoIDEwIG9yIHNvIG1hcCB0YXNrcy4gRHVyaW5nIHRoZSB0YXNrJ3MgZXhlY3V0 aW9uLCB0aGUNCmVudGlyZSBzdGFjayAoYW5kIG15IE9TIGZvciB0aGF0IG1hdHRlcikgc3RhcnQg ZmFpbGluZyBkdWUgdG8gYmVpbmcgdW5hYmxlIHRvIGZvcmsoKSBuZXcgcHJvY2Vzc2VzLg0KSXQg c2VlbXMgSGFkb29wICgxLjAuMykgaXMgY3JlYXRpbmcgNzAwKyB0aHJlYWRzIGFuZCBleGhhdXN0 aW5nIHRoaXMgcmVzb3VyY2UuIFJBTSB1dGlsaXNhdGlvbiBpcyBmaW5lIGhvd2V2ZXIuDQpUaGlz IHN0aWxsIG9jY3VycyB3aXRoIHVsaW1pdCBzZXQgdG8gdW5saW1pdGVkLg0KDQpBbnkgaWRlYXMg b3IgYWR2aWNlIHdvdWxkIGJlIGdyZWF0LCBpdCBzZWVtcyB2ZXJ5IHNrZXRjaHkgZm9yIGEgdGFz ayB0aGF0IGRvZXNuJ3QgcmVxdWlyZSBtdWNoIGdydW50Lg0KDQpDaGVlcnMhDQoNCg== --_000_DA8340397F7BAE41B8102757834CEB120F530F98EXMBOX2xiaomine_ Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable
what's the exactly OOM error message, is it sth like "OutOfMemo= ryError: unable to create new native thread" ?

=B7=A2=BC=FE=C8=CB: Aiden Bell [aiden449@= gmail.com]
=B7=A2=CB=CD=CA=B1=BC=E4: 2012=C4=EA10=D4=C218=C8=D5 22:24
=CA=D5=BC=FE=C8=CB: user@hadoop.apache.org
=D6=F7=CC=E2: OOM/crashes due to process number limit

Hi All,

Im running quite a basic map/reduce job with 10 or so map tasks. During the= task's execution, the
entire stack (and my OS for that matter) start failing due to being unable = to fork() new processes.
It seems Hadoop (1.0.3) is creating 700+ threads and exhausting this re= source. RAM utilisation is fine however.
This still occurs with ulimit set to unlimited.

Any ideas or advice would be great, it seems very sketchy for a task that d= oesn't require much grunt.

Cheers!

--_000_DA8340397F7BAE41B8102757834CEB120F530F98EXMBOX2xiaomine_--