hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: architecture help
Date Mon, 16 Nov 2009 06:10:10 GMT
>> I would like the connection management to live separately
>>from the mapper instances per node.
The JVM reuse option in Hadoop might be helpful for you in this case.


On 11/16/09 6:22 AM, "yz5od2" <woods5242-outdoors@yahoo.com> wrote:


a) I have a Mapper ONLY job, the job reads in records, then parses
them apart.  No reduce phase

b) I would like this mapper job to save the record into a shared mysql
database on the network.

c) I am running a 4 node cluster, and obviously running out of
connections very quickly, that is something I can work on the db
server side.

What I am trying to understand, is that for each mapper task instance
that is processing an input split... does that run in its own
classloader? I guess I am trying to figure out how to manage a
connection pool on each processing node, so that all mapper instances
would use that to get access to the database. Right now it appears
that each node is creating thousands of mapper instance each with
their own connection management, hence this is blowing up quite
quickly. I would like the connection management to live separately
from the mapper instances per node.

I hope I am explaining what I want to do ok, please let me know if
anyone has any thoughts, tips, best practices, features I should look
at etc.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message