Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hive-dev@hadoop.apache.org
Message-ID: <1362444943.61061262757114466.JavaMail.jira@brutus.apache.org>
Date: Wed, 6 Jan 2010 05:51:54 +0000 (UTC)
From: "Namit Jain (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Subject: [jira] Commented: (HIVE-988) mapjoin should throw an error if the
 input is too large
In-Reply-To: <800814905.1260829405861.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796997#action_12796997 ] 

Namit Jain commented on HIVE-988:
---------------------------------

I meant in the job tracker page, the mapper will be shown as succeeded. I guess that might be acceptable.
We definitely need to fix the race condition, otherwise it is good.

> mapjoin should throw an error if the input is too large
> -------------------------------------------------------
>
>                 Key: HIVE-988
>                 URL: https://issues.apache.org/jira/browse/HIVE-988
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Ning Zhang
>             Fix For: 0.5.0
>
>         Attachments: HIVE-988.patch, HIVE-988_2.patch
>
>
> If the input to the map join is larger than a specific threshold, it may lead to a very slow execution of the join.
> It is better to throw an error, and let the user redo his query as a non map-join query.
> However, the current map-reduce framework will retry the mapper 4 times before actually killing the job.
> Based on a offline discussion with Dhruba, Ning and myself, we came up with the following algorithm:
> Keep a threshold in the mapper for the number of rows to be processed for map-join. If the number of rows
> exceeds that threshold, set a counter and kill that mapper.
> The client (ExecDriver) monitors that job continuously - if this counter is set, it kills the job and also
> shows an appropriate error message to the user, so that he can retry the query without the map join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.