nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <nutch-...@dragonflymc.com>
Subject Re: JobConf Questions
Date Tue, 06 Feb 2007 15:12:55 GMT
If no mapper or reducer class is set in the jobConf then the code 
defaults to IdentityMapper and IdentityReducer respectively which 
essentially are pass throughs of key/value pairs.

Dennis Kubes

Charlie Williams wrote:
> I am very new to the Nutch source code, and have been reading over the
> Injector class code. From what I understood of the MapReduce system  there
> had to be both a map and reduce step in order for the algorithm to function
> properly. However, in CrawlDb.createJob( Configuration, Path ) a new job is
> created for merging the injected URLs that has no Mapper Class set.
> 
> ..
> 
> JobConf job = new NutchJob(config);
> job.setJobNmae("crawldb " + crawlDb);
> 
> 
> Path current = new Path(crawlDb, CrawlDatum.DB_DIR_NAME);
> if ( FileSystem.get( job ).exists( current ) ) {
>   job.addInputPath( current );
> }
> 
> job.setInputFormat( SequenceFileInputFormat.class );
> job.setInputKeyClass( UTF8.class );
> job.setInputValueClass( CrawlDatum.class );
> 
> job.setReducerClass( CrawlDbReducer.class );
> 
> job.setOutputPath( newCrawlDb);
> job.setOutputFormat( MapFileOutputFormat.class );
> job.setOutputKeyClass( UTF8.class );
> job.setOutputValueClass( CrawlDatum.class );
> 
> return job;
> 
> 
> How does this code function properly?
> 
> Is it designed to only run on a single machine and thus does not need a
> mapper function set?
> 
> Thanks for any help,
> 
> -Charles Williams
> 

Mime
View raw message