db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation
Date Wed, 07 Dec 2011 05:48:40 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164165#comment-13164165
] 

Mamta A. Satoor commented on DERBY-4631:
----------------------------------------

I debugged the code to figure out what (and where) are we doing in the code which causes us
to give wrong results for join column in case of territory based database and right outer
join with NATURAL or USING clause. As Knut pointed out earlier in this jira, as per the SQL
spec, "the join columns in a natural join or in a named columns join should be added to the
select list by coalescing the column from the left table with the column from the right table.
" What I have found is that Derby decides to pick up join column's value from the left table
when we are working with natural left outer join and it picks up the join column's value from
the right table when we are working with natural right outer join. This is not a problem when
we are dealing with non-territory based databases but the assumption to rely on just one table's
join column is incorrect when working with territory based databases. Following is the test
case I used for debugging which further explains Derby's current implementation.

connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';

create table big(x varchar(5)); 
insert into big values 'A','B', null; 
create table small(x varchar(5)); 
insert into small values 'b','c', null; 
select * from small t1 natural left outer join big t2;
select * from small t1 natural right outer join big t2;

For both natural left outer join and natural right outer join, at execution time, we create
a merged row which has columns merged from the left and right tables. The column(in my example,
there is only one column)s in the select sql maps to a column from the merged row. The mapping
is determined at the sql compile phase. 
In the case of 'select * from small t1 natural left outer join big t2', there will be three
merged rows with 2 columns each
'b'  'B'
'c'  null
null null
And for natural left outer join, the generated code has column in the select SQL map to the
first column in the merged row. This will always work fine even in a territory based database
because as per the SQL standards, the column x should be equivalent to the return value of
coalesce(t1.x, t2.x). Since we are working with left outer join, then if the first column
in the merged row is null, then even the 2nd column(from the right table) will be null and
hence it is ok to always pick up the value from the 1st column in the merged row. This mapping
will always lead column x to have the same value as coalesce(t1.x, t2.x).

But for a territory based database, we can't count on a logic like that for natural right
outer join. The way Derby works right now, the column x in the select always gets mapped to
the 2nd column in the merged row. In the case of 'select * from small t1 natural right outer
join big t2', there will be three merged rows with 2 columns each
null 'A'
'b'  'B'
null null
And for natural right outer join, the generated code has column in the select SQL map to the
second column in the merged row. This will work fine in a non-territory database, because
if column 1 in the merged row has a non-null value, then it will always be the same value
as the column 2 in the merged row. But in our example, with territor based database(with SECONDARY
strength, meaning it is case insensitive comparison), values 'B' and 'b' are considered equal.
Hence the coalesce(t1.x, t2,x) will not be same as value in the 2nd column of the merged row.
For natural right outer join with the data given in the example above, 
coalesce(t1.x, t2,x) will return 'A', 'b' and null. But with the mapping of column x in the
SELECT to the 2nd column in the merged row will return 'A', 'B' and null thus returning data
which does not comply with SQL standard which says that column x's value should be the return
value of coalesce(t1.x, t2.x). So it seems like may be we need some of kind projection in
case of natural right outer join (rather than simple column mapping to the 2nd column which
is what happens right now) so that we look at both the columns in the merged row to determine
the value of column x.

Hope this explanation helps understand what Derby is doing internally and based on that, we
can come up with some proposal to fix the issue.

                
> Wrong join column returned by right outer join with NATURAL or USING and territory-based
collation
> --------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4631
>                 URL: https://issues.apache.org/jira/browse/DERBY-4631
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0
>            Reporter: Knut Anders Hatlen
>              Labels: derby_triage10_8
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join
big t2;
> X    |X    |X    |4    
> -----------------------
> A    |NULL |A    |A    
> B    |b    |B    |b    
> C    |c    |C    |c    
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message