Return-Path: X-Original-To: apmail-db-derby-dev-archive@www.apache.org Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C159610CCB for ; Thu, 5 Sep 2013 14:47:54 +0000 (UTC) Received: (qmail 56321 invoked by uid 500); 5 Sep 2013 14:47:53 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 56287 invoked by uid 500); 5 Sep 2013 14:47:53 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 56169 invoked by uid 99); 5 Sep 2013 14:47:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Sep 2013 14:47:52 +0000 Date: Thu, 5 Sep 2013 14:47:52 +0000 (UTC) From: "Dag H. Wanvik (JIRA)" To: derby-dev@db.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DERBY-6227) Distinct aggregates don't work well with territory-based collation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DERBY-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dag H. Wanvik updated DERBY-6227: --------------------------------- Description: When working on DERBY-5840, I noticed that GroupedAggregateResultSet would do duplicate elimination by comparing the java.lang.String representation of the values. With territory-based collation, it is possible that two values that have different java.lang.String representation should be considered duplicates, and this logic will produce incorrect results. Example: ij version 10.10 ij> connect 'jdbc:derby:memory:db;territory=en_US;collation=TERRITORY_BASED:PRIMARY;create=true'; ij> create table t(i int, s varchar(10)); 0 rows inserted/updated/deleted ij> insert into t values (1, 'a'), (1, 'a'), (2, 'b'), (2, 'B'), (3, 'a'), (3, 'A'), (3, 'b'), (3, 'B'), (3, 'c'); 9 rows inserted/updated/deleted ij> select distinct s from t; S ---------- b a c 3 rows selected ij> select i, count(distinct s) from t group by i; I |2 ----------------------- 1 |1 2 |2 3 |5 3 rows selected I would have expected the last query to return (1, 1) (2, 1) (3, 3) was: When working on DERBY-5840, I noticed that GroupedAggregateResultSet would do duplicate elimination by comparing the java.lang.String representation of the values. With territory-based collation, it is possible that two values that have different java.lang.String representation should be considered duplicates, and this logic will produce incorrect results. Example: ij version 10.10 ij> connect 'jdbc:derby:memory:db;territory=en_US;collation=TERRITORY_BASED:PRIMARY;create=true'; ij> create table t(i int, s varchar(10)); 0 rows inserted/updated/deleted ij> insert into t values (1, 'a'), (1, 'a'), (2, 'b'), (2, 'B'), (3, 'a'), (3, 'A'), (3, 'b'), (3, 'B'), (3, 'c'); 9 rows inserted/updated/deleted ij> select distinct s from t; S ---------- b a c 3 rows selected ij> select i, count(distinct s) from t group by i; I |2 ----------------------- 1 |1 2 |2 3 |5 3 rows selected I would have expected the last query to return (1, 1) (2, 1) (3, 3) > Distinct aggregates don't work well with territory-based collation > ------------------------------------------------------------------ > > Key: DERBY-6227 > URL: https://issues.apache.org/jira/browse/DERBY-6227 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.6.1.0, 10.6.2.1, 10.7.1.1, 10.8.1.2, 10.8.2.2, 10.8.3.0, 10.9.1.0, 10.10.1.1 > Reporter: Knut Anders Hatlen > Labels: derby_triage10_11 > > When working on DERBY-5840, I noticed that GroupedAggregateResultSet would do duplicate elimination by comparing the java.lang.String representation of the values. With territory-based collation, it is possible that two values that have different java.lang.String representation should be considered duplicates, and this logic will produce incorrect results. > Example: > ij version 10.10 > ij> connect 'jdbc:derby:memory:db;territory=en_US;collation=TERRITORY_BASED:PRIMARY;create=true'; > ij> create table t(i int, s varchar(10)); > 0 rows inserted/updated/deleted > ij> insert into t values (1, 'a'), (1, 'a'), (2, 'b'), (2, 'B'), (3, 'a'), (3, 'A'), (3, 'b'), (3, 'B'), (3, 'c'); > 9 rows inserted/updated/deleted > ij> select distinct s from t; > S > ---------- > b > a > c > 3 rows selected > ij> select i, count(distinct s) from t group by i; > I |2 > ----------------------- > 1 |1 > 2 |2 > 3 |5 > 3 rows selected > I would have expected the last query to return > (1, 1) > (2, 1) > (3, 3) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira