hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: architecture help
Date Mon, 16 Nov 2009 16:49:44 GMT
The easiest way is making your connection pool class as the static member of
your mapper class.


Jeff Zhang


On Mon, Nov 16, 2009 at 7:33 AM, yz5od2 <woods5242-outdoors@yahoo.com>wrote:

> Thanks all for the replies, that makes sense. I think I am allocating
> connection resources per-mapper, instead of per-task.
>
> How do I programatically allocate a "pool" or shared resource for a task,
> that all Mapper instances can have access to?
>
> 1) I have 4 nodes, each node has a map capacity of 2 for a total of 8 tasks
> running simultaneously. The job I am running is queuing up ~950 tasks that
> need to be done.
>
> 2) the mysql server I am connecting to is configured to permit 300
> connections.
>
> 2) When a Mapper instance starts, right now each mapper instance is
> handling the connections, obviously this is my problem as each task must be
> spinning up dozens/hundreds of mapper instances to process the task (is that
> right? or does one mapper instance process an entire split?). I need to move
> this to the "task", but this is where I need some pointers on where to look.
>
> When I submit my job is there some way to say:
>
>
> jobConf.setTaskHandlingClass(SomeClassThatCreatesThePoolThatTaskMapperInstancesAccess.class)
>
> ??
>
>        -
>
>
> On Nov 15, 2009, at 7:57 PM, Jeff Zhang wrote:
>
>  Each map task will run in an separate JVM. So you should create connection
>> pool for each task, And all the mapper instances in one task share the
>> same
>> connection pool.
>>
>> Another suggestion is that you can use JNDI to manger the connection . It
>> can be shared by all the map tasks in your cluster.
>>
>>
>> Jeff Zhang
>>
>>
>>
>>
>> On Mon, Nov 16, 2009 at 8:52 AM, yz5od2 <woods5242-outdoors@yahoo.com
>> >wrote:
>>
>>  Hi,
>>>
>>> a) I have a Mapper ONLY job, the job reads in records, then parses them
>>> apart.  No reduce phase
>>>
>>> b) I would like this mapper job to save the record into a shared mysql
>>> database on the network.
>>>
>>> c) I am running a 4 node cluster, and obviously running out of
>>> connections
>>> very quickly, that is something I can work on the db server side.
>>>
>>> What I am trying to understand, is that for each mapper task instance
>>> that
>>> is processing an input split... does that run in its own classloader? I
>>> guess I am trying to figure out how to manage a connection pool on each
>>> processing node, so that all mapper instances would use that to get
>>> access
>>> to the database. Right now it appears that each node is creating
>>> thousands
>>> of mapper instance each with their own connection management, hence this
>>> is
>>> blowing up quite quickly. I would like the connection management to live
>>> separately from the mapper instances per node.
>>>
>>> I hope I am explaining what I want to do ok, please let me know if anyone
>>> has any thoughts, tips, best practices, features I should look at etc.
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message