hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Forrest <mforr...@trailfire.com>
Subject problem with IdentityMapper
Date Thu, 10 Jan 2008 22:51:05 GMT
I'm running into a problem where IdentityMapper seems to produce way too 
much data.  For example, I have a job that reads a sequence file using 
IdentityMapper and then uses IdentityReducer to write everything back 
out to another sequence file.  My input is a ~60MB sequence file and 
after the map phase has completed, the job tracker UI reports about 10GB 
for "Map output bytes".  It seems like the output collector does not get 
properly reset and so each map that gets emitted has the correct key but 
the value ends up being all the data you've encountered up to that 
point.  I think this is a known issue but I can't seem to find any 
discussion about it right now.  Has anyone else run into this, and if 
so, is there a solution?  I'm using the latest code in the 0.15 branch.

View raw message