db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunitha Kambhampati (JIRA)" <derby-...@db.apache.org>
Subject [jira] Updated: (DERBY-302) Takes over 3 minutes to insert a 500kb String into CLOB
Date Fri, 10 Jun 2005 17:19:49 GMT
     [ http://issues.apache.org/jira/browse/DERBY-302?page=all ]

Sunitha Kambhampati updated DERBY-302:

    Attachment: fixclobperf_0609.txt

Currently in derby,  for an insert on a clob using setCharacterStream what will happen is
, the entire stream will be materialized into a char array and sent to store for the insert.
 ( we should not have to stream here. I will file another jira issue for this and put in all
information I learnt)

Given this is how inserts for large clobs are happening, the performance issue analysis is
as follows:
--  profiler run shows that most time is spent in SQLChar.readExternal which is where the
materialization into a char array for the user's input stream happens.  The growth of this
array happens gradually till the entire stream is materialized into the array.  Below code
snippet shows by how much the array is grown each time when it realizes it has to read more
bytes from the stream.
The dvd hierarchy for clob is  -  SQLClob ( dvd) extends SQLVarChar extends SQLChar.

So in SQLChar.readExternal
    int growby = in.available();
   if(growby < 64)
       growby = 64
 and then an allocation and an arraycopy to the new allocated array.

--  In the code snippet,  'in' is the wrapper around the user's stream which is ReaderToUTF8Stream
.   ReaderToUTF8Stream extends InputStream and  does not override available() method . As
per the spec, InputStream.available() returns 0.

-- Thus each time, the array growth is by 64 bytes which is obviously not performant.  so
for a 500k clob insert, this would mean allocation & arraycopy steps happen  ~8000 times.

-- The ReaderToUTF8Stream that has the user's stream reads from the stream and does the utf8
conversion and puts it in a 4k array.  I think it is reasonable to have a 32k buffer to store
this information for clobs.

Although I think there seems to be more possible optimizations in this area,  I prefer the
incremental approach too :)  so this patch  is a first step towards fixing the insert clob
performance in the current system.

Fix includes:
-- enhanced the way the array was grown to keep the original  64 bytes for char ( seems reasonable
given the upper limit for char) but override it to have  4k for varchar and clobs.
-- override available() in ReaderToUTF8Stream to return a better estimate of how many bytes
can be read.

svn stat
M      java\engine\org\apache\derby\impl\jdbc\ReaderToUTF8Stream.java
M      java\engine\org\apache\derby\iapi\services\io\LimitReader.java
M      java\engine\org\apache\derby\iapi\types\SQLChar.java
M      java\engine\org\apache\derby\iapi\types\SQLVarchar.java

-- ran derbyall ok with sun jvm.

--  I can add a test and compare times but I realize that is probably not the best solution
here.  It should ideally be part of a performance regression suite.

Numbers for clob inserts in seconds for one insert  on my laptop  - as per the jira issue.
With fix , times are in seconds for 1 insert on a clob on my laptop (windows, 1G ram, 1.6Ghz
Intel Pentium(M) )

FileSize            ibm jvm 1.4.2      sun jvm 1.4.2         sun jvm 1.5
500k               0.9s                     1.6s                        1.7s
1M                  2.1s                     4s                           5s
2M                  3s                        9s                           11s
4M                  7s                        18s                         22s

Without the fix, 500k with sun jvm takes 3 mins and ibm jvm takes 20 seconds.
I will add the test program along with the input files to jira issue.
Without this fix :  As I already mentioned in the jira comment for derby302,  I changed the
program in the attached jira entry to use BufferedReader with the buffersize set to a bigger
value than the default ( to 64k) brought down the times for sun jvm closer to ibm jvm.  I
noticed that in my test, if I ran the test multiple times and did multiple inserts the performance
of sun jvm and ibm jvm for 500k clob was around 20 seconds - guess the jit kicks in , plus
the OS cache may also be a factor..

> Takes over 3 minutes to insert a 500kb String into CLOB
> -------------------------------------------------------
>          Key: DERBY-302
>          URL: http://issues.apache.org/jira/browse/DERBY-302
>      Project: Derby
>         Type: Bug
>   Components: SQL
>  Environment: Windows XP Professional, Dell Pentium IV
>     Reporter: Glenn Orbon
>     Assignee: Sunitha Kambhampati
>  Attachments: fixclobperf_0609.txt
> I downloaded a Cloudescape Version 10 from here http://www-106.ibm.com/developerworks/db2/library/techarticle/dm-0408cline/index.html.
 Not sure exactly which version of 10 this is.
> Would like to store up to a 10MB XML string in a CLOB.  However, I noticed it took over
3 minutes to insert a 500kb string.  I am using the PreparedStatement to get around the 37kb
limitation encountered in a previous issue.  It takes less than 2 seconds to insert a 500kb
string into a CLOB in mySQl.
> Here the snippet:
>       FileReader fr = new FileReader ("sample.txt");  
>       BufferedReader br = new BufferedReader(fr);
>       String record = null;
>       try {
>         while ( (record=br.readLine()) != null ) {
>           bufferStr.append( record );
>       }
>       } catch (IOException e) {
>       //
>       // put your error-handling code here
>       //
>          System.out.println("Error reading file");
>       }
>       System.out.println("Size of inputStr: "+bufferStr.length() );
>       PreparedStatement ps = mm.connection.prepareStatement("INSERT into  documents 
         VALUES (?,?)" );
>       ps.setInt(1, 13 );
>       StringReader reader1 = new StringReader(bufferStr.toString());
>       ps.setCharacterStream(2, reader1, bufferStr.length());
>       System.out.println("Uploading string....");
>       ps.execute();
>       System.out.println("Done uploading string...");
>       mm.connection.commit();
> Thanks for your the help,
> Glenn O.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message