hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <o...@yahoo-inc.com>
Subject Re: What do people use Hadoop for?
Date Mon, 22 Jan 2007 21:48:54 GMT

On Jan 22, 2007, at 3:55 AM, Andrzej Bialecki wrote:

> Why is that? Is it because people don't know about it, or other  
> models for tackling out-of-core tasks are more popular, or it's not  
> applicable to most problems out there? I'm not sure. From my  
> experience I know that it's often not obvious (and sometimes  
> impossible?) how to decompose an existing algorithm so that it fits  
> in the map-reduce paradigm.

I would guess the primary reason is that until Hadoop came along  
there wasn't a way to write or run map/reduce applications. Map/ 
reduce is still pretty new. MPI was first released 12 years ago. PVM  
is even older. A lot of the people in the super computing community  
are just learning about map/reduce.

> Do people use Hadoop for tasks outside the well-known class of web- 
> related problems?

One obvious application that uses map/reduce is log analysis. If you  
have gigabytes or terabytes of logs to scan and analyze, map/reduce  
helps a lot.

In terms of non-obvious applications, for one of the Yahoo Hack Days,  
I wrote a distributed implementation of dancing links (http:// 
en.wikipedia.org/wiki/Dancing_Links) solver and a pentomino (http:// 
en.wikipedia.org/wiki/Pentomino) tiling framework based on it. In  
particular, I solved the 9x10 grid using the 18 one sided pentominos.  
The approach that I used, expanded the search tree to depth N and  
wrote out each of possible prefixes as a line. At a depth of 5, my  
problem generated 150,000 prefixes. I used map/reduce and text input  
format to split the input space into a bunch of pieces and generate  
all of the solutions for each prefix, and a single reduce to collect  
solutions.

So the input file looked like:
1,0,0,1,0,
1,0,0,1,1,
1,0,0,1,2,
1,0,0,1,3,
1,0,0,1,4,
1,0,0,1,5,
..

and the output looked like (each letter is piece and the grid shows  
the layout):

1,0,0,1,1,
uuuiiiiit
uxupppttt
xxxwppPPt
vxwwffPPn
vwwffYlPn
vvvzfYlnn
yzzzYYlnL
yzZZFYllL
yyZFFFNNL
yZZFNNNLL

1,0,0,1,1,
uuuiiiiit
uxupppttt
xxxwppFNt
vxwwFFFNN
vwwllFnYN
vvvzlnnYN
yzzzlnYYL
yzZZlnfYL
yyZPPPffL
yZZPPffLL

...

I choose that particular problem, because in Knuth's paper on dancing  
links (http://xxx.lanl.gov/PS_cache/cs/pdf/0011/0011047.pdf) on page  
16, he started to solve it and determined it would take a couple of  
months using his solver. Using Hadoop on 20 nodes, it takes a couple  
of hours. Fun stuff. I should probably package up the solver and  
submit it in the Hadoop examples. As a side benefit, I also did a  
sudoku solver using my dancing links solver. There isn't any point in  
running the sudoku solver distributed though, since it can solve huge  
puzzles (as in 42x42) in less than 1 second.

-- Owen

Mime
View raw message