hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sharat attupurath <shara...@hotmail.com>
Subject RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
Date Tue, 08 May 2012 16:25:05 GMT

i followed an old tutorial it seems, so my reducer is implementing org.apache.hadoop.mapred.Reducer

Date: Tue, 8 May 2012 09:22:20 -0700
Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
From: lordjoe2000@gmail.com
To: mapreduce-user@hadoop.apache.org

of course but do your reducers subclass org.apache.hadoop.mapreduce.Reducer ororg.apache.hadoop.mapred.Reducer
 

On Tue, May 8, 2012 at 8:32 AM, sharat attupurath <sharat_a@hotmail.com> wrote:





i am using hadoop 0.20.203

Date: Tue, 8 May 2012 07:47:23 -0700
Subject: RE: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
From: lordjoe2000@gmail.com

To: mapreduce-user@hadoop.apache.org

Which api are you using? They changed between 0.18 and 0.20.this is the more recent version


On May 8, 2012 3:55 AM, "sharat attupurath" <sharat_a@hotmail.com> wrote:






Hi Steve,
I tried using the NShotInputFormat, but after setting it as the InputFormat class and running
my mapreduce job I get the following error :
Exception in thread "main" java.lang.RuntimeException: class tsphadoop.NShotInputFormat not
org.apache.hadoop.mapred.InputFormat

	at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:915)	at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:590)

	at tsphadoop.TspHadoop.main(TspHadoop.java:40)	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

	at java.lang.reflect.Method.invoke(Method.java:601)	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



Am i missing something? Is there anything else I should know about using NShotInputFormat?
Thanks and regards
Sharat


Date: Mon, 7 May 2012 09:24:05 -0700
Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in Hadoop
From: lordjoe2000@gmail.com
To: mapreduce-user@hadoop.apache.org



Fair enough - I write a lot of InputFormats since for most of my problems a line of text is
not the proper unit -I read fasta files - read lines intil you hit a line starting with >
and xml fragments - read until you hit a closing tag




On Mon, May 7, 2012 at 9:03 AM, GUOJUN Zhu <guojun_zhu@freddiemac.com> wrote:





The default FileInputformat split the
file according to the size.  If you use line text data, the TextFileInputformat
respects the line structure for input.   We got splits as small as
a few KBs.  The file split is a tricky business, especially when you
want it to respect your logical boundary. It is better to use the existing
battle-test code than invent your own wheel. 



Zhu, Guojun

Modeling Sr Graduate

571-3824370

guojun_zhu@freddiemac.com

Financial Engineering

Freddie Mac









   Steve Lewis
<lordjoe2000@gmail.com> 
   05/07/2012 11:17 AM




   Please respond
to

mapreduce-user@hadoop.apache.org






To
mapreduce-user@hadoop.apache.org


cc



Subject
Re: Ant Colony Optimization for Travelling
Salesman Problem in Hadoop













Yes but it is the job of the InputFormat code to implement
the behavior - it is not necessary to do so and in other cases I choose
to create more mappers when the mapper has a lot of work



On Mon, May 7, 2012 at 7:54 AM, GUOJUN Zhu <guojun_zhu@freddiemac.com>
wrote:



We are using old API of 0.20.  I think when you set "mapred.reduce.tasks"
with certain number N and use fileinputformat, the default behavior is
that any file will be split into that number, N, each split smaller than
the default block size. Of course, other restriction, such as "mapred.min.split.size"
cannot be set too large (default is as small as possible I think). 



Zhu, Guojun

Modeling Sr Graduate

571-3824370

guojun_zhu@freddiemac.com

Financial Engineering

Freddie Mac 










   sharat attupurath
<sharat_a@hotmail.com>

   05/05/2012 11:37 AM









   Please respond
to

mapreduce-user@hadoop.apache.org






To
<mapreduce-user@hadoop.apache.org>



cc



Subject
RE: Ant Colony Optimization for Travelling
Salesman Problem in Hadoop

















Since the input files are very small, the default input formats in Hadoop
all generate just a single InputSplit, so only a single map task is executed,
and we wont have much parallelism. 



I was thinking of writing an InputFormat that would read the whole file
as an InputSplit and replicate this input split n times (where n would
be the number of ants in a single stage) so that we'll have n mappers.

Also I want the input format to return the value as the adjacency matrix
of the graph (calculating it from the coordinates in the input file). 



But I can't find a way to do this. Is it possible? Or is it better to just
have the input as Text and create the adjacency matrix in the mappers?




Date: Sat, 5 May 2012 08:20:34 -0700

Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in
Hadoop

From: lordjoe2000@gmail.com

To: mapreduce-user@hadoop.apache.org



yes - if you know how you can put it in distributed cache or if it is small
put in as a String in the config or have all InputFormats read it from
somewhere



On Sat, May 5, 2012 at 8:08 AM, sharat attupurath <sharat_a@hotmail.com>
wrote: 

I looked at both the files. in AbstractNShotInputFormat it is mentioned
that this input format does not read from files. My input is in a text
file. I want the whole file as a single record. So is it enough if i just
copy the contents of the file and return it as a string from getValueFromIndex()
? 




Date: Fri, 4 May 2012 13:15:46 -0700




Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in
Hadoop

From: lordjoe2000@gmail.com

To: mapreduce-user@hadoop.apache.org



Look at NShotInputFormat 

call setNumberKeys to set the number of ants then 

change the method getValueFromIndex to return the text representing the
original problem to each mapper - 

what happens in the second and third jobs is an exercise but the saved
test needs to have pheremones and the original problem 





On Fri, May 4, 2012 at 9:54 AM, sharat attupurath <sharat_a@hotmail.com>
wrote: 

Hi,



Thanks a lot for the quick reply sir! We are new to Apache Hadoop and haven't
yet understood it properly yet. Can you please elaborate on how we can
have multiple stages of mapreduce jobs for combining the trails as you
have mentioned? 



We have been trying to find out how to write a custom splitter and almost
all online resources tell to subclass the FileInputFormat and write only
a custom RecordReader? Will it be possible to generate our splits in that
way?



Regards



Sharat




Date: Fri, 4 May 2012 09:22:51 -0700

Subject: Re: Ant Colony Optimization for Travelling Salesman Problem in
Hadoop

From: lordjoe2000@gmail.com

To: mapreduce-user@hadoop.apache.org










On Fri, May 4, 2012 at 9:06 AM, sharat attupurath <sharat_a@hotmail.com>
wrote: 

Hi,



We are trying to parallelize the ant colony optimization algorithm for
TSP over hadoop and are facing some issues. We are using TSPLIB as input
files. The input is a text file containing eucledian coordinates of the
cities - first column is city number and the next two columns contain the
x and y coordinates respectively. 



What we intend to do is to take input from this single file, send copies
of it to multiple mappers (each mapper acts like the ant in the algorithm),
each mapper works on its input to find its own TSP solution that it outputs
and finally the reducer outputs the smallest tour found by the mappers.
Hope we are in the right track. Here are the issues:



1) Since the input file is small, we need to force hadoop to fire up multiple
map tasks by replicating the input. How can we make an InputSplit of the
whole file and replicate it so that the input can be sent to multiple mappers?


Write a custom splitter sending the same data to all mappers - the only
critical criteria is the number of "Ants" 



2) the algorithm uses a shared pheromone array and each mapper needs to
read and write data from this. How can we share the pheromone data across
the mappers. 

You cannot share data across mappers and should not attempt to do so -


Better to use the reducer(s) to combine trails  combine first pass
trails and 

then pass the combined trails to another mapreduce job - this with the
original problem plus the current pheremone trails




Hope the questions are clear enough. Any help would be greatly appreciated.




Thank you



Regards



Sharat 







-- 

Steven M. Lewis PhD 

4221 105th Ave NE 

Kirkland, WA 98033 

206-384-1340
(cell)

Skype lordjoe_com











-- 

Steven M. Lewis PhD 

4221 105th Ave NE 

Kirkland, WA 98033 

206-384-1340
(cell)

Skype lordjoe_com











-- 

Steven M. Lewis PhD 

4221 105th Ave NE 

Kirkland, WA 98033 

206-384-1340
(cell)

Skype lordjoe_com











-- 

Steven M. Lewis PhD

4221 105th Ave NE

Kirkland, WA 98033

206-384-1340 (cell)

Skype lordjoe_com








-- 
Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell)

Skype lordjoe_com



 		 	   		  
 		 	   		  


-- 
Steven M. Lewis PhD4221 105th Ave NEKirkland, WA 98033206-384-1340 (cell)
Skype lordjoe_com



 		 	   		  
Mime
View raw message