hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Upright <...@upright.net>
Subject Re: large memory tasks
Date Wed, 15 Jun 2011 22:26:04 GMT
Hi Matt,

My description was just for illustrative purposes.

For actual details of the implementation, it's an algorithm that does some
math over a number of vectors, which resides in a map of sorts.

Isn't a memcached lookup in the order of milliseconds, whereas just a map
lookup is microseconds?  I already have an implementation that is
multithreaded on a single machine, and did some experimentation a long time
ago.  Switching the lookup to memcached or other lookup mechanisims seemed
to drop the efficiency down by orders of magnitude.  This obviously is an
unnaceptable loss.  It would be better to roll my own framework for
distributing this type of task, rather than to incur this loss.  

However, maybe someone can shed some other light on the issue.  If tasks on
one machine finishes a single task twice or three times as fast as tasks on
a different machine, is that a big problem?  (for example, a 2 core machine
finishing slowly, or an 8 core finishing very quickly, etc.)

Also, then comes the complexity, wherein I may be using the cluster to do
more regular jobs, whereby I want multiple tasks per machine, and then
immediately switch back to this other configuration, where I may want just
one task per machine.  i.e. to solve one complete problem, it may require
executing hadoop tasks in a more standard configuration, and then switching
to this type of configuration, and then back again, repeatedly.

Ian

>Is the lookup table constant across each of the tasks? You could try putting it into memcached:
>
>http://hcil.cs.umd.edu/trs/2009-01/2009-01.pdf
>
>Matt
>
>-----Original Message-----
>From: Ian Upright [mailto:ian@upright.net] 
>Sent: Wednesday, June 15, 2011 3:42 PM
>To: common-user@hadoop.apache.org
>Subject: large memory tasks
>
>Hello, I'm quite new to Hadoop, so I'd like to get an understanding of
>something.
>
>Lets say I have a task that requires 16gb of memory, in order to execute.
>Lets say hypothetically it's some sort of big lookuptable of sorts that
>needs that kind of memory.
>
>I could have 8 cores run the task in parallel (multithreaded), and all 8
>cores can share that 16gb lookup table.
>
>On another machine, I could have 4 cores run the same task, and they still
>share that same 16gb lookup table.
>
>Now, with my understanding of Hadoop, each task has it's own memory.
>
>So if I have 4 tasks that run on one machine, and 8 tasks on another, then
>the 4 tasks need a 64 GB machine, and the 8 tasks need a 128 GB machine, but
>really, lets say I only have two machines, one with 4 cores and one with 8,
>each machine only having 24 GB.
>
>How can the work be evenly distributed among these machines?  Am I missing
>something?  What other ways can this be configured such that this works
>properly?
>
>Thanks, Ian
>This e-mail message may contain privileged and/or confidential information, and is intended
to be received only by persons entitled
>to receive such information. If you have received this e-mail in error, please notify
the sender immediately. Please delete it and
>all attachments from any servers, hard drives or any other media. Other use of this e-mail
by you is strictly prohibited.
>
>All e-mails and attachments sent and received are subject to monitoring, reading and archival
by Monsanto, including its
>subsidiaries. The recipient of this e-mail is solely responsible for checking for the
presence of "Viruses" or other "Malware".
>Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any
such code transmitted by or accompanying
>this e-mail or any attachment.
>
>
>The information contained in this email may be subject to the export control laws and
regulations of the United States, potentially
>including but not limited to the Export Administration Regulations (EAR) and sanctions
regulations issued by the U.S. Department of
>Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information
you are obligated to comply with all
>applicable U.S. export laws and regulations.

Mime
View raw message