hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GUOJUN Zhu <guojun_...@freddiemac.com>
Subject RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
Date Mon, 07 May 2012 14:54:01 GMT
We are using old API of 0.20.  I think when you set "mapred.reduce.tasks" 
with certain number N and use fileinputformat, the default behavior is 
that any file will be split into that number, N, each split smaller than 
the default block size. Of course, other restriction, such as 
"mapred.min.split.size" cannot be set too large (default is as small as 
possible I think). 

Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_zhu@freddiemac.com
Financial Engineering
Freddie Mac



   sharat attupurath <sharat_a@hotmail.com> 
   05/05/2012 11:37 AM
   Please respond to
mapreduce-user@hadoop.apache.org


To
<mapreduce-user@hadoop.apache.org>
cc

Subject
RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop






Since the input files are very small, the default input formats in Hadoop 
all generate just a single InputSplit, so only a single map task is 
executed, and we wont have much parallelism. 

I was thinking of writing an InputFormat that would read the whole file as 
an InputSplit and replicate this input split n times (where n would be the 
number of ants in a single stage) so that we'll have n mappers.
Also I want the input format to return the value as the adjacency matrix 
of the graph (calculating it from the coordinates in the input file). 

But I can't find a way to do this. Is it possible? Or is it better to just 
have the input as Text and create the adjacency matrix in the mappers?

Date: Sat, 5 May 2012 08:20:34 -0700
Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in 
Hadoop
From: lordjoe2000@gmail.com
To: mapreduce-user@hadoop.apache.org

yes - if you know how you can put it in distributed cache or if it is 
small put in as a String in the config or have all InputFormats read it 
from somewhere

On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <sharat_a@hotmail.com> 
wrote:
I looked at both the files. in AbstractNShotInputFormat it is mentioned 
that this input format does not read from files. My input is in a text 
file. I want the whole file as a single record. So is it enough if i just 
copy the contents of the file and return it as a string from 
getValueFromIndex() ? 

Date: Fri, 4 May 2012 13:15:46 -0700

Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in 
Hadoop
From: lordjoe2000@gmail.com
To: mapreduce-user@hadoop.apache.org

Look at NShotInputFormat 
call setNumberKeys to set the number of ants then
change the method getValueFromIndex to return the text representing the 
original problem to each mapper -
what happens in the second and third jobs is an exercise but the saved 
test needs to have pheremones and the original problem 


On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <sharat_a@hotmail.com> 
wrote:
Hi,

Thanks a lot for the quick reply sir! We are new to Apache Hadoop and 
haven't yet understood it properly yet. Can you please elaborate on how we 
can have multiple stages of mapreduce jobs for combining the trails as you 
have mentioned? 

We have been trying to find out how to write a custom splitter and almost 
all online resources tell to subclass the FileInputFormat and write only a 
custom RecordReader? Will it be possible to generate our splits in that 
way?

Regards

Sharat

Date: Fri, 4 May 2012 09:22:51 -0700
Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in 
Hadoop
From: lordjoe2000@gmail.com
To: mapreduce-user@hadoop.apache.org




On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <sharat_a@hotmail.com> 
wrote:
Hi,

We are trying to parallelize the ant colony optimization algorithm for TSP 
over hadoop and are facing some issues. We are using TSPLIB as input 
files. The input is a text file containing eucledian coordinates of the 
cities - first column is city number and the next two columns contain the 
x and y coordinates respectively. 

What we intend to do is to take input from this single file, send copies 
of it to multiple mappers (each mapper acts like the ant in the 
algorithm), each mapper works on its input to find its own TSP solution 
that it outputs and finally the reducer outputs the smallest tour found by 
the mappers. Hope we are in the right track. Here are the issues:

1) Since the input file is small, we need to force hadoop to fire up 
multiple map tasks by replicating the input. How can we make an InputSplit 
of the whole file and replicate it so that the input can be sent to 
multiple mappers?
Write a custom splitter sending the same data to all mappers - the only 
critical criteria is the number of "Ants" 

2) the algorithm uses a shared pheromone array and each mapper needs to 
read and write data from this. How can we share the pheromone data across 
the mappers.
You cannot share data across mappers and should not attempt to do so - 
Better to use the reducer(s) to combine trails  combine first pass trails 
and
then pass the combined trails to another mapreduce job - this with the 
original problem plus the current pheremone trails

Hope the questions are clear enough. Any help would be greatly 
appreciated. 

Thank you

Regards

Sharat 



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com





-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com





-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com



Mime
View raw message