nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Charlie Williams" <cwilliams.w...@gmail.com>
Subject JobConf Questions
Date Tue, 06 Feb 2007 13:42:33 GMT
I am very new to the Nutch source code, and have been reading over the
Injector class code. From what I understood of the MapReduce system  there
had to be both a map and reduce step in order for the algorithm to function
properly. However, in CrawlDb.createJob( Configuration, Path ) a new job is
created for merging the injected URLs that has no Mapper Class set.

..

JobConf job = new NutchJob(config);
job.setJobNmae("crawldb " + crawlDb);


Path current = new Path(crawlDb, CrawlDatum.DB_DIR_NAME);
if ( FileSystem.get( job ).exists( current ) ) {
   job.addInputPath( current );
}

job.setInputFormat( SequenceFileInputFormat.class );
job.setInputKeyClass( UTF8.class );
job.setInputValueClass( CrawlDatum.class );

job.setReducerClass( CrawlDbReducer.class );

job.setOutputPath( newCrawlDb);
job.setOutputFormat( MapFileOutputFormat.class );
job.setOutputKeyClass( UTF8.class );
job.setOutputValueClass( CrawlDatum.class );

return job;


How does this code function properly?

Is it designed to only run on a single machine and thus does not need a
mapper function set?

Thanks for any help,

-Charles Williams

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message