[ https://issues.apache.org/jira/browse/HIVE-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ning Zhang updated HIVE-988:
----------------------------
Attachment: HIVE-988.patch
Uploading HIVE-988.patch. Changes include:
1) add a FATAL_ERROR counter for each operator.
2) increment the FATAL_ERROR counter in MapJoinOperator
3) modify the Operator.process() to check for fatal error, return doing nothing if there is
fatal error.
4) in ExecDriver.progress() check the FATAL_ERROR counter in each operator and get the error
message.
5) kill the whole job if there is FATAL_ERROR.
6) many changes to the TestParser due to the addition of FATAL_ERROR counter.
> mapjoin should throw an error if the input is too large
> -------------------------------------------------------
>
> Key: HIVE-988
> URL: https://issues.apache.org/jira/browse/HIVE-988
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Ning Zhang
> Fix For: 0.5.0
>
> Attachments: HIVE-988.patch
>
>
> If the input to the map join is larger than a specific threshold, it may lead to a very
slow execution of the join.
> It is better to throw an error, and let the user redo his query as a non map-join query.
> However, the current map-reduce framework will retry the mapper 4 times before actually
killing the job.
> Based on a offline discussion with Dhruba, Ning and myself, we came up with the following
algorithm:
> Keep a threshold in the mapper for the number of rows to be processed for map-join. If
the number of rows
> exceeds that threshold, set a counter and kill that mapper.
> The client (ExecDriver) monitors that job continuously - if this counter is set, it kills
the job and also
> shows an appropriate error message to the user, so that he can retry the query without
the map join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|