db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5235) Remove the artificial limit on the length of VARCHAR values, allowing them to be java.lang.Integer.MAX_VALUE long
Date Wed, 18 May 2011 19:36:47 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035611#comment-13035611
] 

Rick Hillegas commented on DERBY-5235:
--------------------------------------

Why have longer varchars?

It's true that you might want to sort or index these values. But you might merely want to
ask whether longStringA = longStringB. Derby won't let you do that with LONG VARCHAR or CLOB.
Whatever arbitrary limit you put on the length of your Strings, someone will need a String
that is just a little longer. Arbitrary limits are the hallmark of bad language design. On
this point, a broad consensus has evolved, as you can see if you google up the keywords "arbitrary
limits programming".

What does data look like in the real world?

I can only offer anecdotal evidence here. In writing Derby-powered apps for myself, I am often
frustrated by having to choose between a String type which is comparable and a String type
which is big enough to hold the occasional large value. Some data distributions are homogeneous.
But others fit a normal distribution, with a big lump of similar values in the middle and
bizarre outliers in the tails. I think it is not uncommon to need a String column which usually
holds small values but which can balloon up in edge or error cases.

What direction should we go in?

SQL has too many String types. We are stuck with the annoying semantic differences between
CHAR and VARCHAR. Luckily for us, the Unicode basis of Java Strings means that we don't have
to worry about NCHAR, NVARCHAR, LONG NVARCHAR, and NCLOB too.

I would like to see the differences among our String types vanish, not harden. I don't think
customers want to have to trade off compactness against performance against usability. Ideally,
there would only be one String type and the database would be smart enough to store and process
it efficiently.

For compliance and compatibility reasons, we'll never eliminate the redundancy in our String
types. But CLOB, LONG VARCHAR, and VARCHAR( 2147483647 ) should be synonyms. The customer
should get the same great usability and performance regardless of the String type they pick.

By the way, I like Mike's suggestions about the possibility of indexing long Strings. Hopefully
someone will want to invest in that improvement some day.


> Remove the artificial limit on the length of VARCHAR values, allowing them to be java.lang.Integer.MAX_VALUE
long
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5235
>                 URL: https://issues.apache.org/jira/browse/DERBY-5235
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.9.0.0
>            Reporter: Rick Hillegas
>
> The original Cloudscape limit for the length of VARCHAR values was java.lang.Integer.MAX_VALUE.
That is the limit in Cloudscape 5.1. Nothing in Derby should break if we restore the original
limit. The current limit is an artificial bound introduced to make Derby agree with DB2. 32672
is the upper bound on the length of a DB2 VARCHAR: http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0001029.htm

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message