hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Created: (HIVE-988) mapjoin should throw an error if the input is too large
Date Mon, 14 Dec 2009 22:23:25 GMT
mapjoin should throw an error if the input is too large

                 Key: HIVE-988
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: Ning Zhang
             Fix For: 0.5.0

If the input to the map join is larger than a specific threshold, it may lead to a very slow
execution of the join.
It is better to throw an error, and let the user redo his query as a non map-join query.

However, the current map-reduce framework will retry the mapper 4 times before actually killing
the job.
Based on a offline discussion with Dhruba, Ning and myself, we came up with the following

Keep a threshold in the mapper for the number of rows to be processed for map-join. If the
number of rows
exceeds that threshold, set a counter and kill that mapper.

The client (ExecDriver) monitors that job continuously - if this counter is set, it kills
the job and also
shows an appropriate error message to the user, so that he can retry the query without the
map join.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message