Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 17C8443B2 for ; Wed, 18 May 2011 00:36:28 +0000 (UTC) Received: (qmail 88435 invoked by uid 500); 18 May 2011 00:36:27 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 88404 invoked by uid 500); 18 May 2011 00:36:27 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 88396 invoked by uid 500); 18 May 2011 00:36:27 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 88393 invoked by uid 99); 18 May 2011 00:36:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 00:36:27 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 00:36:26 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 60F9BCE171 for ; Wed, 18 May 2011 00:35:47 +0000 (UTC) Date: Wed, 18 May 2011 00:35:47 +0000 (UTC) From: "Ning Zhang (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <517396397.21195.1305678947394.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <806651160.17002.1304381883086.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Assigned] (HIVE-2144) reduce workload generated by JDBCStatsPublisher MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Zhang reassigned HIVE-2144: -------------------------------- Assignee: Tomasz Nykiel (was: Ning Zhang) > reduce workload generated by JDBCStatsPublisher > ----------------------------------------------- > > Key: HIVE-2144 > URL: https://issues.apache.org/jira/browse/HIVE-2144 > Project: Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Tomasz Nykiel > > In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID was inserted by another task (mostly likely a speculative or previously failed task). Depending on if the ID is there, an INSERT or UPDATE query was issues. So there are basically 2x of queries per row inserted into the intermediate stats table. This workload could be reduced to 1/2 if we insert it anyway (it is very rare that IDs are duplicated) and use a different SQL query in the aggregation phase to dedup the ID (e.g., using group-by and max()). The benefits are that even though the aggregation query is more expensive, it is only run once per query. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira