hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Forrest <mforr...@trailfire.com>
Subject Re: problem with IdentityMapper
Date Thu, 10 Jan 2008 23:20:29 GMT
I'm using Text for the keys and MapWritable for the values.

Joydeep Sen Sarma wrote:
> what are the key value types in the Sequencefile?
> seems that the maprunner calls createKey and createValue just once. so if the value serializes
out it's entire memory allocated (and not what it last read) - it would cause this problem.
> (I have periodically shot myself in the foot with this bullet).
> ________________________________
> From: Mike Forrest [mailto:mforrest@trailfire.com]
> Sent: Thu 1/10/2008 2:51 PM
> To: hadoop-user@lucene.apache.org
> Subject: problem with IdentityMapper
> Hi,
> I'm running into a problem where IdentityMapper seems to produce way too
> much data.  For example, I have a job that reads a sequence file using
> IdentityMapper and then uses IdentityReducer to write everything back
> out to another sequence file.  My input is a ~60MB sequence file and
> after the map phase has completed, the job tracker UI reports about 10GB
> for "Map output bytes".  It seems like the output collector does not get
> properly reset and so each map that gets emitted has the correct key but
> the value ends up being all the data you've encountered up to that
> point.  I think this is a known issue but I can't seem to find any
> discussion about it right now.  Has anyone else run into this, and if
> so, is there a solution?  I'm using the latest code in the 0.15 branch.
> Thanks
> Mike

View raw message