crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ortiz <>
Subject Force new Map phase
Date Mon, 27 Jul 2015 21:58:37 GMT

     Are there any easy tricks to force a new map stage to kick off?  I know I can force a
reduce with GBK operations, but I am running into an issue where one of our jobs is having
issues with data skew, and from what I can tell, the issue is we are getting a couple hot
keys that join properly, but then when trying to do the follow up processing that comes before
the next join, the reducer hits the GC Overhead Limit.  Based on the dot file, it is trying
to do all the preprocessing for the next join in the reducer from the first join, but it could
easily do it in the map phase before the next join in the pipeline without any issues, and
I think this would also get past the issue we're having with memory.  The only solution I
could think of to try and do this at the moment, is to do everything up to the first join,
call pipeline.done(), then add some more operations before another pipeline.done() operation.

This email is intended only for the use of the individual(s) to whom it is addressed. If you
have received this communication in error, please immediately notify the sender and delete
the original email.

View raw message