hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Wan" <ji...@indeed.com>
Subject Equivalent of cmdline head or tail?
Date Thu, 06 Mar 2008 18:40:14 GMT
I've got some jobs where I'd like to just pull out the top N or bottom N  
values.

It seems like I can't do this from the map or combine phases (due to not  
having enough data), but I could aggregate this data during the reduce  
phase. The problem I have is that I won't know when to actually write them  
out until I've gone through the entire set, at which point reduce isn't  
called anymore.

It's easy enough to post-process with some combination of sort, head, and  
tail, but I was wondering if I was missing something obvious.

-- 
Jimmy

Mime
View raw message