From general-return-1713-apmail-hadoop-general-archive=hadoop.apache.org@hadoop.apache.org Wed Jun 30 10:03:39 2010 Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 22263 invoked from network); 30 Jun 2010 10:03:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Jun 2010 10:03:39 -0000 Received: (qmail 2660 invoked by uid 500); 30 Jun 2010 10:03:38 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 2186 invoked by uid 500); 30 Jun 2010 10:03:36 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 2163 invoked by uid 99); 30 Jun 2010 10:03:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jun 2010 10:03:35 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [192.6.10.2] (HELO colossus.hpl.hp.com) (192.6.10.2) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jun 2010 10:03:25 +0000 Received: from localhost (localhost [127.0.0.1]) by colossus.hpl.hp.com (Postfix) with ESMTP id DD91C1BA77F for ; Wed, 30 Jun 2010 11:03:04 +0100 (BST) X-Virus-Scanned: Debian amavisd-new at hpl.hp.com Received: from colossus.hpl.hp.com ([127.0.0.1]) by localhost (colossus.hpl.hp.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id c2vatex3rTTf for ; Wed, 30 Jun 2010 11:03:04 +0100 (BST) Received: from 0-imap-br1.hpl.hp.com (0-imap-br1.hpl.hp.com [16.25.144.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by colossus.hpl.hp.com (Postfix) with ESMTPS id 779021BA798 for ; Wed, 30 Jun 2010 11:03:04 +0100 (BST) MailScanner-NULL-Check: 1278496972.15263@+BY2U3TolAzUulr1Y+O8GQ Received: from [16.25.175.158] (morzine.hpl.hp.com [16.25.175.158]) by 0-imap-br1.hpl.hp.com (8.14.1/8.13.4) with ESMTP id o5UA2poc000713 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 30 Jun 2010 11:02:51 +0100 (BST) Message-ID: <4C2B164B.20503@apache.org> Date: Wed, 30 Jun 2010 11:02:51 +0100 From: Steve Loughran User-Agent: Thunderbird 2.0.0.24 (X11/20100228) MIME-Version: 1.0 To: general@hadoop.apache.org Subject: Re: What are uses of taskTracker and JobTracker services? References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-HPL-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: o5UA2poc000713 X-HPL-MailScanner: Found to be clean X-HPL-MailScanner-From: stevel@apache.org X-Virus-Checked: Checked by ClamAV on apache.org Hemanth Yamijala wrote: > Hi, > >> I think that he was trying to explain that in HDFS, you have a name node and then your data nodes. >> So you have the name node service on the name node and each data node has a data node service. >> When you run a map reduce job, you have a Job tracker that resides on the name node and controls the overall job. > > May or may not be true. In general, for moderately complex cases, it > is best to run the name node and jobtracker on different nodes so both > masters don't fail where only one of them can. More for scale than availability, was my belief; if the NN goes offline, your JT locks up until it comes back anyway >> On each data node, where the jobs run in parallel, there exists a task tracker. > > This is almost always true, of course - it helps Hadoop to achieve > data locality by colocating where the task runs with where it has to > read data from. If you have machines in the room which can come and go without warning -doing Hadoop work with spare cycles- then you can make them task-tracker only, so you don't store persistent data there, just temp files and when the machines get switched to other work you don't lose HDFS data. But you do increase network traffic...