Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1A5511CD9 for ; Wed, 17 Sep 2014 02:25:43 +0000 (UTC) Received: (qmail 49424 invoked by uid 500); 17 Sep 2014 02:25:39 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 49316 invoked by uid 500); 17 Sep 2014 02:25:39 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49304 invoked by uid 99); 17 Sep 2014 02:25:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Sep 2014 02:25:38 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of fancyerii@gmail.com designates 209.85.215.45 as permitted sender) Received: from [209.85.215.45] (HELO mail-la0-f45.google.com) (209.85.215.45) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Sep 2014 02:25:33 +0000 Received: by mail-la0-f45.google.com with SMTP id b17so959364lan.32 for ; Tue, 16 Sep 2014 19:25:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=AxqAamwLKPbLZygKpoFZAZptJs2ldezd5pGlJqZ/vXw=; b=QNj1hRfOt5PvoWyVUhQ6jsAuXAe2Gdqta1lfsIdH4uTqaWRAkh5RzAXBTvmP7Kq/a+ w1UqeJKxnUO1SIT4ZKxXQqCqUIXcoXwUHe2039JhO7qS5sn+MnQ8UhXI5M1sxKaAyZbQ VL9HyyFHJChnkotPEQah8O46H25Ym3GNQ4h8qQdLwdwJUbBGhCG+oxg2eXHQpbaf6Z0N F86gBzBYN4EhRVP7fqTBWHGqyIxvCPFI9jMMut6D4QTc5Y5rQtxRDhfc+QhDSGa77AH6 F5Untoh/iD3f6tIKr/beU1AipAoxevzLW3Kj+ZuoonHeKZb6FOxmlT71Y0QdilS0VdMR axyg== MIME-Version: 1.0 X-Received: by 10.153.6.5 with SMTP id cq5mr1295624lad.46.1410920711850; Tue, 16 Sep 2014 19:25:11 -0700 (PDT) Received: by 10.112.214.202 with HTTP; Tue, 16 Sep 2014 19:25:11 -0700 (PDT) Date: Wed, 17 Sep 2014 10:25:11 +0800 Message-ID: Subject: hadoop cluster crash problem From: Li Li To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org hi all, I know it's not a problem related to hadoop but administrator can not find any clues. I have a machine with 24 core and 64GB memory with ubuntu 12.04 LTS. we use virtual box to create 4 virtual machine. Each vm has 10GB memory and 6 core. I have setup a small hadoop 1.2.1 cluster with one jobtracker/namenode and 3 tasktracker/datanode. Each tasktrack has 4 mapper slots and 4 reducers slot. But it always crashs(the host machine crash, not vm crash). Sometimes it crashes for the first map-reduce job. Sometimes it can run a few jobs. is there any clues? I have checked the sys log and can find any thing useful. Using monitor system, The cpu and io is not abnormal. The only abnormal phenomenon is context switch is high. about 40k.