nutch-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "Getting Started" by SteveSeverance
Date Tue, 20 Mar 2007 02:33:08 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by SteveSeverance:
http://wiki.apache.org/nutch/Getting_Started

------------------------------------------------------------------------------
   2. Then you need to submit your job to Hadoop to be run. This is done by calling JobClient.runJob.
JobClient. runJob submits the job for starting and handles receiving status updates back from
the job. It starts by creating an instance of the JobClient. It continues to push the job
toward execution by calling JobClient.submitJob
   3. JobClient.submitJob handles splitting the input files and generating the MapReduce task.
  
+ === How do I open Nutch's data files ===
+ You will need to interact with Nutch's files using Hadoop's MapFile and SequenceFile classes.
This simple code sample shows opening a file and reading the values.
+ 
+ {{{
+ MapFile.Reader reader = new MapFile.Reader (fs, seqFile, conf);
+ 
+         Class keyC = reader.getKeyClass();
+         Class valueC = reader.getValueClass();
+ 
+         while (true) {
+             WritableComparable key = null;
+             Writable value = null;
+             try {
+                 key = (WritableComparable)keyC.newInstance();
+                 value = (Writable)valueC.newInstance();
+             } catch (Exception ex) {
+                 ex.printStackTrace();
+                 System.exit(-1);
+             }
+ 
+             try {   
+                 if (!reader.next(key, value)) {
+                     break;
+                 }
+ 
+                 out.println(key);
+                 out.println(value);
+             } catch (Exception e) {
+                 e.printStackTrace();
+                 out.println("Exception occured. " + e);
+                 break;
+             }
+         }
+ 
+ }}}
+ 
  == Tutorials ==
   * CountLinks Counting outbound links with MapReduce
  

Mime
View raw message