Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 54956 invoked from network); 16 Nov 2009 15:33:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Nov 2009 15:33:43 -0000 Received: (qmail 46410 invoked by uid 500); 16 Nov 2009 15:33:40 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 46323 invoked by uid 500); 16 Nov 2009 15:33:40 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 46313 invoked by uid 99); 16 Nov 2009 15:33:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Nov 2009 15:33:40 +0000 X-ASF-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [69.147.102.66] (HELO smtp103.plus.mail.re1.yahoo.com) (69.147.102.66) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 16 Nov 2009 15:33:38 +0000 Received: (qmail 39708 invoked from network); 16 Nov 2009 15:33:17 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Message-Id:From:To:In-Reply-To:Content-Type:Content-Transfer-Encoding:Mime-Version:Subject:Date:References:X-Mailer; b=cDEnNp4PYTtHpH8R2fDq3/b6hvKGCXi6bRV8ePXb5sMfWzRrbr343BEl8icjP2ZfC6xBOesInyyyXmRtgY4Dr7OZ6JR0bcxWF1+jw1OYrIc2EPaPknHum2ks2WkKWXhqPCOGM5rvXaf+OXQBxDMUtk3VuQYaLA+PSxhbBCanHAg= ; Received: from (woods5242-outdoors@199.20.45.23 with plain) by smtp103.plus.mail.re1.yahoo.com with SMTP; 16 Nov 2009 07:33:16 -0800 PST X-Yahoo-SMTP: 13FhZiaswBDbBdBU1FUTDuRbLE73rGEt4GP5VA-- X-YMail-OSG: e.mUqpQVM1m.oPFoaX0_Lqz.yVDk.scEml1AI7wtJAi6FQ.nYyGiHWjIlT68hkDxGDBQZ.GOFlsVGKJOrA6085NpyI5KZxyFJjuUHrnffWxlH1vCCUSKNBjqFfk32jp3kZiZFYQSYZD0MwHWw.PgzUlxH1mH27yUGvCdg54w_tkOWw6gRbysFbkcmGKIU3sCStR04N7cHuHr.k66ldEecVMSuh5UovwSoN.51n4Pgz3raIth4sceUefg33OkE.R5fUJB6OlEOoesc9w0E4iVts9F.apfP5hX6vSEdfK98vNW4WmoGhVPePpPek9x7HFayUWVD3prc_xLpZy857pyd3n6e.nbt6in0EJw.glpLxEce5BDtIs3Xc4LHGvmbMrCLT7HIOpcIFe2zXJqncqsapdaBrVYU.Yedib51RtsA8jGBt.sWlVC X-Yahoo-Newman-Property: ymail-3 Message-Id: <5657441A-65A5-47A3-B982-4EF89BBB692F@yahoo.com> From: yz5od2 To: common-user@hadoop.apache.org In-Reply-To: <8211a1320911151857q647dc64cn6e1120bc64993841@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: architecture help Date: Mon, 16 Nov 2009 08:33:14 -0700 References: <8211a1320911151857q647dc64cn6e1120bc64993841@mail.gmail.com> X-Mailer: Apple Mail (2.936) Thanks all for the replies, that makes sense. I think I am allocating connection resources per-mapper, instead of per-task. How do I programatically allocate a "pool" or shared resource for a task, that all Mapper instances can have access to? 1) I have 4 nodes, each node has a map capacity of 2 for a total of 8 tasks running simultaneously. The job I am running is queuing up ~950 tasks that need to be done. 2) the mysql server I am connecting to is configured to permit 300 connections. 2) When a Mapper instance starts, right now each mapper instance is handling the connections, obviously this is my problem as each task must be spinning up dozens/hundreds of mapper instances to process the task (is that right? or does one mapper instance process an entire split?). I need to move this to the "task", but this is where I need some pointers on where to look. When I submit my job is there some way to say: jobConf .setTaskHandlingClass (SomeClassThatCreatesThePoolThatTaskMapperInstancesAccess.class) ?? - On Nov 15, 2009, at 7:57 PM, Jeff Zhang wrote: > Each map task will run in an separate JVM. So you should create > connection > pool for each task, And all the mapper instances in one task share > the same > connection pool. > > Another suggestion is that you can use JNDI to manger the > connection . It > can be shared by all the map tasks in your cluster. > > > Jeff Zhang > > > > > On Mon, Nov 16, 2009 at 8:52 AM, yz5od2 outdoors@yahoo.com>wrote: > >> Hi, >> >> a) I have a Mapper ONLY job, the job reads in records, then parses >> them >> apart. No reduce phase >> >> b) I would like this mapper job to save the record into a shared >> mysql >> database on the network. >> >> c) I am running a 4 node cluster, and obviously running out of >> connections >> very quickly, that is something I can work on the db server side. >> >> What I am trying to understand, is that for each mapper task >> instance that >> is processing an input split... does that run in its own >> classloader? I >> guess I am trying to figure out how to manage a connection pool on >> each >> processing node, so that all mapper instances would use that to get >> access >> to the database. Right now it appears that each node is creating >> thousands >> of mapper instance each with their own connection management, hence >> this is >> blowing up quite quickly. I would like the connection management to >> live >> separately from the mapper instances per node. >> >> I hope I am explaining what I want to do ok, please let me know if >> anyone >> has any thoughts, tips, best practices, features I should look at >> etc. >> >> thanks >> >> >> >> >>