db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Regunath Balasubramanian" <regunat...@mindtree.com>
Subject Error reading CLOB
Date Tue, 11 Jul 2006 09:50:54 GMT
Hi, 
 
I chose to use Derby as an embedded DB to store text parsed/stripped from web
pages, MS Office files and PDF documents while implementing an indexing and
search solution. I need the parsed text of the document to enable search term
highlighting to produce an effective summary of search hits.
The natural choice was to use the CLOB data type. I store the contents using
PreparedStatement.setCharacterStream(column, reader) where reader is a
java.io.StringReader constructed from the java.lang.String instance
representing the entire parsed contents. I then read the contents out using
ResultSet.getClob(column).getCharacterStream().
 
This works fine during write always but fails for a few during the read. What
surprises me is the fact  that I read and write using the Derby classes and
therfore naturally expect that they work. The error is in the of the
fillBuffer() method of the UTF8Reader class. It throws a
UTFDataFormatException. 
 
I made a few frustating attempts at trying to get it work - I tried
constructing the parsed string using different encodings (UTF-8, ISO-8859-1)
at the time of write, tried to read it as a binary stream - failed with a
nice exception stating that I was trying to read a CLOB as binary, ascii
stream - failed with the same data format exception.
 
Finally I decided to write the contents as a BLOB instead. The bytes for
writing were constructed using String.getBytes(). I read the contents as
Blob.getBytes() and  then construct the String using the new String(byte[]).
This works!
 
I wonder why the UTF8 reader of Derby failed? I have the above mentioned
workaround but would like to know if there is an alternative.
 
Cheers!
Regu

Mime
View raw message