hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-988) mapjoin should throw an error if the input is too large
Date Mon, 14 Dec 2009 22:23:25 GMT
mapjoin should throw an error if the input is too large
-------------------------------------------------------

                 Key: HIVE-988
                 URL: https://issues.apache.org/jira/browse/HIVE-988
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: Ning Zhang
             Fix For: 0.5.0


If the input to the map join is larger than a specific threshold, it may lead to a very slow
execution of the join.
It is better to throw an error, and let the user redo his query as a non map-join query.

However, the current map-reduce framework will retry the mapper 4 times before actually killing
the job.
Based on a offline discussion with Dhruba, Ning and myself, we came up with the following
algorithm:

Keep a threshold in the mapper for the number of rows to be processed for map-join. If the
number of rows
exceeds that threshold, set a counter and kill that mapper.

The client (ExecDriver) monitors that job continuously - if this counter is set, it kills
the job and also
shows an appropriate error message to the user, so that he can retry the query without the
map join.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message