crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ortiz <dpo5...@gmail.com>
Subject Re: Access number of reducer tasks from Crunch
Date Sun, 03 May 2015 01:16:36 GMT
Do you actually care about the number of reducers, or just get top n from a
table?  The latter is built into the framework.

On Sat, May 2, 2015, 6:12 PM Vincent Fabro <vincent.fabro.nutch@gmail.com>
wrote:

> Dear all
>
> Is it possible to access the number of reducer tasks from Crunch
> (something equivalent to context.getNumReduceTasks() in Hadoop)?
>
> Context: I'm porting Nutch to Crunch. One operation (in
> GeneratorJob.java, GeneratorMapper.java and GeneratorReducer.java -
> https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/crawl/GeneratorReducer.java)
> takes the n top urls acccording to a score. If I understand well, "n/num of
> reduce tasks" urls are selected for each reduce task (GeneratorReducer,
> line 102). If there's a good shuffle, the result is good enough.
>
> Thanks in advance!
>
> Vincent
>

Mime
View raw message