hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ricky Ho <rickyphyl...@yahoo.com>
Subject RE: breadth-first search
Date Wed, 22 Dec 2010 18:46:12 GMT
You can do whatever your want (including spawning threads) in the Mapper process 
(which is fork/exec by the TaskTracker).  But this doesn't help

I think you need to understand the fundamental difference between the 2 parallel 
processing models

1) Multi-threading
Small scale parallelism limited to number of cores within a single machine.
Multiple execution threads with a shared memory model.  And a lot of 
synchronization primitives to coordinate access to share data

2) Map/Reduce
Large scale parallelism involving large number of machines (hundreds to 
Data is shuffled through 2 layers of machines via a special topology (output 
records of the same key from layer1 will land on the same place at layer 2)

The first layer is conducting a per-record transformation (map) and the second 
layer is conducting a consolidation (reduce)

These 2 models has a very different notions of synchronization, one is using 
fine grain locking and the other is share nothing, you won't be consider them to 
be alternatives to each other.  They are intended to solve very very different 

Can they be used together ?  Absolutely yes.  But you need to design how you 
want to partition your problem ...

For example, you can consider partitioning your graph into sub-graphs so each 
Mapper/Reducer is dealing with a bigger sub-graph rather than individual nodes.  
Of course you need to think about how to combine the subgraph results, and 
whether you need absolutely accurate answer or an approximation is good enough.  
I bet you are the later one and so should be more easy.


Can you point me to Matrix algorithms that is tuned for sparse graph ?  What I 
mean is from O(v^3) to O(v*e)  where v = number of vertex and e = number of 


-----Original Message-----
From: Peng, Wei [mailto:Wei.Peng@xerox.com] 
Sent: Wednesday, December 22, 2010 8:58 AM
To: common-user@hadoop.apache.org
Subject: RE: breadth-first search
Can someone tell me whether we can run multiple threads in hadoop?
-----Original Message-----
From: Peng, Wei [mailto:Wei.Peng@xerox.com] 
Sent: Tuesday, December 21, 2010 9:07 PM
To: common-user@hadoop.apache.org
Subject: RE: breadth-first search
I was just trying to run 100 source nodes in multiple threads, but the
mapreduce tasks still look like to run in sequential.
Do I need to configure hadoop somehow for multiple threads? Assign more
task slots? How?


View raw message