Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 45089 invoked from network); 25 May 2006 05:51:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 25 May 2006 05:51:57 -0000 Received: (qmail 18544 invoked by uid 500); 25 May 2006 05:51:56 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 18532 invoked by uid 500); 25 May 2006 05:51:56 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 18523 invoked by uid 99); 25 May 2006 05:51:56 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 May 2006 22:51:56 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of thione@gmail.com designates 64.233.162.207 as permitted sender) Received: from [64.233.162.207] (HELO nz-out-0102.google.com) (64.233.162.207) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 May 2006 22:51:55 -0700 Received: by nz-out-0102.google.com with SMTP id o37so1755024nzf for ; Wed, 24 May 2006 22:51:32 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:in-reply-to:references:mime-version:content-type:message-id:cc:content-transfer-encoding:from:subject:date:to:x-mailer; b=NvbRqwio5r6sNLX0C2FEGb/N1tKNA1tVn/hG2ENcnjdqy1sZMLY1C4I9U/l5Pi8Ixq0NdrY+lTz7ONhcO0NpcQ2265rmvEF3Qaituy/atr3MnF2nURCrAaEFracHO7HCBZrnSJ+DIG5PgzyY7+7ba1CC8IQ4qQmssRemKHex2KE= Received: by 10.85.15.2 with SMTP id s2mr3033358aui; Wed, 24 May 2006 22:51:32 -0700 (PDT) Received: from ?10.1.1.100? ( [157.22.41.243]) by mx.gmail.com with ESMTP id i36sm8433690wxd.2006.05.24.22.51.31; Wed, 24 May 2006 22:51:31 -0700 (PDT) In-Reply-To: <44746E29.1030803@dragonflymc.com> References: <5438AA87-1469-4F49-BABF-43E3A6BD1856@gmail.com> <44746E29.1030803@dragonflymc.com> Mime-Version: 1.0 (Apple Message framework v750) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <563DFE2F-2EEE-4DB8-AE68-268806A7018E@gmail.com> Cc: Barney Pell Content-Transfer-Encoding: 7bit From: Gianlorenzo Thione Subject: Re: Multiple tasktrackers per node Date: Wed, 24 May 2006 22:51:28 -0700 To: hadoop-user@lucene.apache.org X-Mailer: Apple Mail (2.750) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Thanks for the answer. So far I am still trying to understand how each tasktracker gets multiple map or reduce tasks to be executed simultaneously. I have run a simple job with 53 map tasks on 5 nodes, and at all times each node was executing a single task. Each cluster node is a 4 core machine, so theoretically this was a 16-node cluster and I feel that the resources were actually underutilized. Am I missing something? Is there a parameter for a minimum number of tasks to be executed in parallel (I found a parameter for setting a maximum [which I set to 4])? If I run 4 TaskTrackers per node then each node gets a map task at the same time and execution seems overall much faster. I'd appreciate help and insights with respect to this matter. Eventually each map task in our application will synchronize with an external single-threaded cpu-intensive process to process data (thus using the tasktracker as a driver for these processes). We need to make sure that each node is utilized at its maximum capacity at all times by running 4 instances of those single-threaded processes and in order to achieve that we'd need each TaskTracker being handed on average 4 map jobs at a time, each to be run concurrently in a different thread. Is there a way to guarantee that this happen? In alternative we can always run 4 TaskTracker per node, which was our original plan, but if there are better/smarter way to do this, that would be the best solution. Thanks in advance! Lorenzo Thione On May 24, 2006, at 7:31 AM, Dennis Kubes wrote: > Using Java 5 will allow the threads of various tasks to take > advantage of multiple processors. Just make sure you set you map > tasks property to a multiple of the number of processors total. We > are running multi-core machines and are seeing good utilization > across all cores this way. > > Dennis > > > > Gianlorenzo Thione wrote: >> Hello everybody, >> >> I'll ask my first question on this forum and hopefully start >> building more and more understanding of hadoop so that we can >> eventually contribute actively. In the meanwhile, I have a simple >> issue/question/suggestion.... >> >> I have many multi-core, multi-processor nodes in my cluster and >> I'd like to be able to run several tasktrackers and datanode per >> physical machine. I am modifying the startup scripts so that a >> number of worker JVMs can be started on each node, maxed out at >> the number of CPUs seen by the kernel. >> >> Since our map jobs are highly CPU intensive it makes sense to run >> parallel jobs on each node, maximizing the CPU utilization. >> >> Is that something that would make sense to roll back in the >> scripts for hadoop as well? Anybody else running on multi >> processor architectures? >> >> Lorenzo Thione >> Powerset, Inc. >>