Contents:
============================================================
0. Introduction
1. The standard
2. Conclusion
Appendix A: A note on implementation
0. Introduction
============================================================
There's been a lot of discusion around the use of UNIQUE and
nullability in Derby so I decided to walk through the standards and
find what they say (regardless what Sybase, Informix, DB2, Oracle and
other databases may have implemented) and then draw my conclusions.
1. The standard
============================================================
The definition of NULL in ISO/IEC 9075-1:2003 (E) SQL 2003 Part 1 -
SQL/Framework, p. 15 is:
4.4.2 The null value
Every data type includes a special value, called the null value,
sometimes denoted by the keyword NULL. This value differs from
other values in the following respects:
-- Since the null value is in every data type, the data type of
the null value implied by the keyword NULL cannot be inferred;
hence NULL can be used to denote the null value only in certain
contexts, rather than everywhere that a literal is permitted.
-- Although the null value is neither equal to any other value nor
not equal to any other value -- it is unknown whether or not it
is equal to any given value -- in some contexts, multiple null
values are treated together; for example, the
treats all null values together.
Furthermore, whether to allow nullable columns is an optional feature
and not a part of SQL 2003 Mandatory (nor of SQL 99 Core): (ISO/IEC
9075-2:2003 (E) SQL 2003 Part 2 - SQL/Foundation, p. 1161)
189) Specifications for Feature T591, "UNIQUE constraints of
possibly null columns":
a) Subclause 11.7, "":
i) Without Feature T591, "UNIQUE constraints of possibly
null columns", in conforming SQL language, if UNIQUE is
specified, then the for each column
whose is contained in the shall contain NOT NULL.
NOTE 484 - The Conformance Rules of Subclause 9.10,
"Grouping operations", also apply.
The UNIQUE constraint is defined as follows (ISO/IEC 9075-2:2003 (E)
SQL 2003 Part 2 - SQL/Foundation, p. 568):
2) The unique constraint is not satisfied if and only if
EXISTS ( SELECT * FROM TN WHERE NOT ( SC ) )
is True.
And SC is defined as (ISO/IEC 9075-2:2003 (E) SQL 2003 Part 2 -
SQL/Foundation, p. 567):
UNIQUE ( SELECT UCL FROM TN)
where UCL is the unique column list and TN the table name.
The unique predicate is defined as follows (ISO/IEC 9075-2:2003 (E)
SQL 2003 Part 2 - SQL/Foundation, p. 422):
2) If there are no two rows in T such that the value of each
column in one row is non-null and is not distinct from the
value of the corresponding column in the other row, then the
result of the is True; otherwise, the result
of the is False.
where T is the argument to UNIQUE (the table sub-query).
The definition of distinct is (ISO/IEC 9075-2:2003 (E) SQL 2003 Part 2
- SQL/Foundation, p. 29):
3.1.6.8 distinct (of a pair of comparable values): Capable of
being distinguished within a given context. Informally,
not equal, not both null. A null value and a non-null
value are distinct.
For two non-null values, the following rules apply:
-- Two values of predefined type or reference type are distinct
if and only if they are not equal.
(I do not bother to quote the definition of equal, since I hope it is
understood by most programmers)
2. Conclusion
==================================================
When you are finished parsing the negations in the definition of the
unique predicate the you will find that the standard allows as many
NULL's as you wish under an unique constraint. *BUT*: you can't have a
unique constraint on columns which do not have the NOT NULL constraint
unless optional feature T591 is implemented. Thus, Derby is perfectly
compliant with SQL 2003 Mandatory.
Indices are *NOT* a part of the SQL standard, so no one can claim that
Derby's implementation of unique indices is wrong or right.
One can, however, conclude that Derby's implementation of unique
indices is not conformant with the SQL unique predicate.
Personally i think that it is unfortunate that Derby's unique indices
are not compliant with the SQL 2003 standard unique predicate.
Appendix A: A note on implementation
==================================================
For those wondering how unique indices with multiple null values may
be implemented, it can be done pretty simple: The columns with null
values are not part of the index. This will if course affect the
execution plan.
Example:
CREATE TABLE t (i INTEGER, a INTEGER NOT NULL, b INTEGER);
CREATE UNIQUE INDEX idx1 ON t(i);
CREATE UNIQUE INDEX idx2 ON t(a, b);
SELECT i FROM t WHERE i=1; -- Use index idx1
SELECT i FROM t WHERE i IS NULL; -- Do a table scan
SELECT a FROM t WHERE a=5; -- Do a table scan
SELECT a FROM t WHERE a=5 AND b IS NOT NULL;
-- Use index idx2
There is of course are other ways of implementing unique indices with
multiple null values.
--
Bernt Marius Johnsen, Database Technology Group,
Sun Microsystems, Trondheim, Norway