hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [jira] Commented: (HADOOP-939) No-sort optimization
Date Mon, 29 Jan 2007 18:17:51 GMT
Arkady Borkovsky wrote:
> Does this model assume that the size of the output of reduce is similar 
> to the size of the input?
> 
> An important class of applications (mentioned in this thread before) 
> uses two inputs:
> -- M ("master file") -- very large, presorted and not changing from run 
> to run,
> -- D ("details file") -- smaller, different from run  to run, not 
> necessarily presorted
> and the output size is proportional to the size of D.
> In this case the gain from "no-sort" may be much higher, as the 13 
> "transfer and write" to DFS are applied to a smaller amount of data, 
> while 11 (b-d) sort-n-shuffle-related are saved on the larger data).

Could a combiner be used in this hypothetical case?  If so, then the b-d 
steps might be faster too.

Doug

Mime
View raw message