db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernt M. Johnsen" <Bernt.John...@Sun.COM>
Subject Derby, SQL 2003, UNIQUE constraints, unique indices and NULL (long)
Date Wed, 02 Nov 2005 11:40:53 GMT
Contents:
============================================================
0. Introduction
1. The standard
2. Conclusion
Appendix A: A note on implementation


0. Introduction
============================================================

There's been a lot of discusion around the use of UNIQUE and
nullability in Derby so I decided to walk through the standards and
find what they say (regardless what Sybase, Informix, DB2, Oracle and
other databases may have implemented) and then draw my conclusions.


1. The standard
============================================================

The definition of NULL in ISO/IEC 9075-1:2003 (E) SQL 2003 Part 1 -
SQL/Framework, p. 15 is:

    4.4.2 The null value

    Every data type includes a special value, called the null value,
    sometimes denoted by the keyword NULL. This value differs from
    other values in the following respects:

    -- Since the null value is in every data type, the data type of
       the null value implied by the keyword NULL cannot be inferred;
       hence NULL can be used to denote the null value only in certain
       contexts, rather than everywhere that a literal is permitted.

    -- Although the null value is neither equal to any other value nor
       not equal to any other value -- it is unknown whether or not it
       is equal to any given value -- in some contexts, multiple null
       values are treated together; for example, the <group by clause>
       treats all null values together.

Furthermore, whether to allow nullable columns is an optional feature
and not a part of SQL 2003 Mandatory (nor of SQL 99 Core): (ISO/IEC
9075-2:2003 (E) SQL 2003 Part 2 - SQL/Foundation, p. 1161)

    189) Specifications for Feature T591, "UNIQUE constraints of
         possibly null columns": 

         a) Subclause 11.7, "<unique constraint definition>":

            i) Without Feature T591, "UNIQUE constraints of possibly
            null columns", in conforming SQL language, if UNIQUE is
            specified, then the <column definition> for each column
            whose <column name> is contained in the <unique column
            list> shall contain NOT NULL. 

            NOTE 484 - The Conformance Rules of Subclause 9.10,
            "Grouping operations", also apply.


The UNIQUE constraint is defined as follows (ISO/IEC 9075-2:2003 (E)
SQL 2003 Part 2 - SQL/Foundation, p. 568):

    2) The unique constraint is not satisfied if and only if
           EXISTS ( SELECT * FROM TN WHERE NOT ( SC ) )
       is True.

And SC is defined as (ISO/IEC 9075-2:2003 (E) SQL 2003 Part 2 -
SQL/Foundation, p. 567):

    UNIQUE ( SELECT UCL FROM TN)

where UCL is the unique column list and TN the table name.

The unique predicate is defined as follows (ISO/IEC 9075-2:2003 (E)
SQL 2003 Part 2 - SQL/Foundation, p. 422):

    2) If there are no two rows in T such that the value of each
       column in one row is non-null and is not distinct from the
       value of the corresponding column in the other row, then the
       result of the <unique predicate> is True; otherwise, the result
       of the <unique predicate> is False.

where T is the argument to UNIQUE (the table sub-query).

The definition of distinct is (ISO/IEC 9075-2:2003 (E) SQL 2003 Part 2
- SQL/Foundation, p. 29):

    3.1.6.8 distinct (of a pair of comparable values): Capable of
            being distinguished within a given context.  Informally,
            not equal, not both null. A null value and a non-null
            value are distinct.

    For two non-null values, the following rules apply:
    --  Two values of predefined type or reference type are distinct
        if and only if they are not equal. 

(I do not bother to quote the definition of equal, since I hope it is
understood by most programmers)


2. Conclusion
==================================================

When you are finished parsing the negations in the definition of the
unique predicate the you will find that the standard allows as many
NULL's as you wish under an unique constraint. *BUT*: you can't have a
unique constraint on columns which do not have the NOT NULL constraint
unless optional feature T591 is implemented. Thus, Derby is perfectly
compliant with SQL 2003 Mandatory.

Indices are *NOT* a part of the SQL standard, so no one can claim that
Derby's implementation of unique indices is wrong or right.

One can, however, conclude that Derby's implementation of unique
indices is not conformant with the SQL unique predicate.

Personally i think that it is unfortunate that Derby's unique indices
are not compliant with the SQL 2003 standard unique predicate.


Appendix A: A note on implementation
================================================== 

For those wondering how unique indices with multiple null values may
be implemented, it can be done pretty simple: The columns with null
values are not part of the index. This will if course affect the
execution plan.

Example:

CREATE TABLE t (i INTEGER, a INTEGER NOT NULL, b INTEGER);
CREATE UNIQUE INDEX idx1 ON t(i);
CREATE UNIQUE INDEX idx2 ON t(a, b);

SELECT i FROM t WHERE i=1;       -- Use index idx1
SELECT i FROM t WHERE i IS NULL; -- Do a table scan
SELECT a FROM t WHERE a=5;       -- Do a table scan
SELECT a FROM t WHERE a=5 AND b IS NOT NULL;
                                 -- Use index idx2

There is of course are other ways of implementing unique indices with
multiple null values.
-- 
Bernt Marius Johnsen, Database Technology Group, 
Sun Microsystems, Trondheim, Norway

Mime
View raw message