Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 19633 invoked from network); 11 Apr 2007 21:23:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Apr 2007 21:23:53 -0000 Received: (qmail 35614 invoked by uid 500); 11 Apr 2007 21:23:59 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 35566 invoked by uid 500); 11 Apr 2007 21:23:59 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 35557 invoked by uid 99); 11 Apr 2007 21:23:59 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2007 14:23:59 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2007 14:23:52 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 68D5F714077 for ; Wed, 11 Apr 2007 14:23:32 -0700 (PDT) Message-ID: <32451068.1176326612426.JavaMail.jira@brutus> Date: Wed, 11 Apr 2007 14:23:32 -0700 (PDT) From: "Kristian Waagan (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Commented: (DERBY-2346) Provide set methods for clob for embedded driver In-Reply-To: <14969143.1171631765740.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488197 ] Kristian Waagan commented on DERBY-2346: ---------------------------------------- Regarding the UTF-8 char -> byte -> char conversion using String methods, I don't think it is a bug. Unmappable "chars" are represented by '?' (0xf3 / 63). In the snippet above, (char)56249 (0xdbb9) happens to be in a PUA area. These codepoints are reserved for private use, and the Unicode standard does not define any characters for them. You could use DataOutput/DataInput and write-/readUTF, but I don't know how efficient this would be. These methods write the strings to the modfied UTF-8 format, and the equals in the example above returns true. I think writing your own method would be acceptable, but it would be interesting if anyone took the time to investigate the cpu/space differences (i.e. what kind of stream can we use underneath? ByteArrayOutputStream? Subclass of it that returns reference to the byte array?) Even though the example uses a "very special codepoint", the database should handle it. An application could potentially use it for its own custom character (not quite sure how though). Further, it seems the "UTF-8" encoding (as used in String.getBytes()) does not promise to encode all unsigned 16 bit values, but only valid Unicode characters. I'm not very good with the Unicode terminology, so there might be errors in my comment and maybe important additions. Feel free to correct me. > Provide set methods for clob for embedded driver > ------------------------------------------------ > > Key: DERBY-2346 > URL: https://issues.apache.org/jira/browse/DERBY-2346 > Project: Derby > Issue Type: Sub-task > Components: JDBC > Affects Versions: 10.3.0.0 > Reporter: Anurag Shekhar > Assigned To: Anurag Shekhar > Attachments: derby-2346-only_for_review.diff, derby-2346.v1.diff > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.