crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Fabro <vincent.fabro.nu...@gmail.com>
Subject Access number of reducer tasks from Crunch
Date Sat, 02 May 2015 22:12:32 GMT
Dear all

Is it possible to access the number of reducer tasks from Crunch (something
equivalent to context.getNumReduceTasks() in Hadoop)?

Context: I'm porting Nutch to Crunch. One operation (in  GeneratorJob.java,
GeneratorMapper.java and GeneratorReducer.java -
https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/crawl/GeneratorReducer.java)
takes the n top urls acccording to a score. If I understand well, "n/num of
reduce tasks" urls are selected for each reduce task (GeneratorReducer,
line 102). If there's a good shuffle, the result is good enough.

Thanks in advance!

Vincent

Mime
View raw message