hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-988) mapjoin should throw an error if the input is too large
Date Wed, 06 Jan 2010 01:11:54 GMT

     [ https://issues.apache.org/jira/browse/HIVE-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ning Zhang updated HIVE-988:

    Attachment: HIVE-988.patch

Uploading HIVE-988.patch. Changes include:
1) add a FATAL_ERROR counter for each operator.
2) increment the FATAL_ERROR counter in MapJoinOperator
3) modify the Operator.process() to check for fatal error, return doing nothing if there is
fatal error.
4) in ExecDriver.progress() check the FATAL_ERROR counter in each operator and get the error
5) kill the whole job if there is FATAL_ERROR.
6) many changes to the TestParser due to the addition of FATAL_ERROR counter.

> mapjoin should throw an error if the input is too large
> -------------------------------------------------------
>                 Key: HIVE-988
>                 URL: https://issues.apache.org/jira/browse/HIVE-988
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Ning Zhang
>             Fix For: 0.5.0
>         Attachments: HIVE-988.patch
> If the input to the map join is larger than a specific threshold, it may lead to a very
slow execution of the join.
> It is better to throw an error, and let the user redo his query as a non map-join query.
> However, the current map-reduce framework will retry the mapper 4 times before actually
killing the job.
> Based on a offline discussion with Dhruba, Ning and myself, we came up with the following
> Keep a threshold in the mapper for the number of rows to be processed for map-join. If
the number of rows
> exceeds that threshold, set a counter and kill that mapper.
> The client (ExecDriver) monitors that job continuously - if this counter is set, it kills
the job and also
> shows an appropriate error message to the user, so that he can retry the query without
the map join.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message