Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hive-dev@hadoop.apache.org
Message-ID: <1039646200.1236799610419.JavaMail.jira@brutus>
Date: Wed, 11 Mar 2009 12:26:50 -0700 (PDT)
From: "Zheng Shao (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Subject: [jira] Commented: (HIVE-339) [Hive] problem in count distinct in
 1mapreduce job with map side aggregation
In-Reply-To: <992513386.1236750650417.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681002#action_12681002 ] 

Zheng Shao commented on HIVE-339:
---------------------------------

Agree.

In short, for DINSTICT aggregations in 1-map/reduce plan, we do "de-duplicate" as much as we can in the map-phase, and the whole aggregation is done in the reduce phase.
That's why reduce phase also needs iterate().

> [Hive] problem in count distinct in 1mapreduce job with map side aggregation
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-339
>                 URL: https://issues.apache.org/jira/browse/HIVE-339
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.339.1.patch, hive.339.2.patch, hive.339.3.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.