incubator-hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "IOSystem" by thomasjungblut
Date Sun, 11 Dec 2011 00:07:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "IOSystem" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/IOSystem?action=diff&rev1=1&rev2=2

+ <<TableOfContents(5)>>
+ 
+ == General Information ==
+ 
+ Since Hama 0.4.0 we provide a input and output system for BSP Jobs.
+ 
+ TODO: Some blahblah about key value and stuff
+ What's in case when no input is configured? and stuff like that should be documented here..
+ 
+ 
+ == Input ==
+ 
+ === Configuring Input ===
+ 
+ When setting up a BSPJob, you can provide a InputFormat and a Path where to find the input.
+ 
+ {{{
+     BSPJob job = new BSPJob();
+     // detail stuff omitted
+     job.setInputPath(new Path("/tmp/test.seq");
+     job.setInputFormat(org.apache.hama.bsp.SequenceFileInputFormat.class);
+ }}}
+ 
+ Another way to add input paths is following:
+ {{{ 
+    SequenceFileInputFormat.addInputPath(job, new Path("/tmp/test.seq"));
+ }}}
+ 
+ You can also add multiple paths by using this method:
+ 
+ {{{ 
+    SequenceFileInputFormat.addInputPaths(job, "/tmp/test.seq,/tmp/test2.seq,/tmp/test3.seq");
+ }}}
+ 
+ '''Note that these paths must be separated by a comma.'''
+ 
+ In case of a {{{SequenceFileInputFormat}}} the key and value pair are parsed from the header.
+ 
+ When you use want to read a basic textfile with {{{TextInputFormat}}} the key is always
{{{LongWritable}}} which contains how much bytes have been read and {{{Text}}} which contains
a line of your input. 
+ 
+ 
+ === Using Input ===
+ 
+ You can now read the input from each of the functions in {{{BSP}}} class which has {{{BSPPeer}}}
as parameter. (e.G. setup / bsp / cleanup)
+ 
+ In this case we read a normal text file:
+ {{{
+  @Override
+   public final void bsp(
+       BSPPeer<LongWritable, Text, KEYOUT, VALUEOUT> peer)
+       throws IOException, InterruptedException, SyncException {
+       
+       // this method reads the next key value record from file
+       KeyValuePair<LongWritable, Text> pair = peer.readNext();
+ 
+       // the following lines do the same:
+       LongWritable key = new LongWritable();
+       Text value = new Text();
+       peer.readNext(key, value);
+   }
+ }}}
+ 
+ Consult the docs for more detail on events like end of file.
+ 
+ There is also a function which allows you to re-read the input from the beginning.
+ 
+ This snippet reads the input five times:
+ 
+ {{{
+   for(int i = 0; i < 5; i++){
+     LongWritable key = new LongWritable();
+     Text value = new Text();
+     while (peer.readNext(key, value)) {
+        // read everything
+     }
+     // reopens the input
+     peer.reopenInput()
+   }
+ }}}
+ 
+ === Custom Inputformat ===
+ 
+ You can implement your own inputformat blabla
+ 
+ == Output ==
+ 
+ === Configuring Output ===
+ 
+ === Using Input ===
+ 
+ === Custom Outputformat ===
+ 
+ == Implementation notes ==
+ 
+ === Internal implementation details ===
+ 
  BSPJobClient
   
   1. Create the splits for the job
@@ -12, +108 @@

   1. Receives splitFile
   2. Add split argument to TaskInProgress constructor
  
+ Task
+ 
+  1. Gets his split from Groom
+  2. Initializes everything in BSPPeerImpl
+ 

Mime
View raw message