Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Derby Discussion" <derby-user@db.apache.org>
Received-SPF: pass (herse.apache.org: local policy)
Date: Mon, 23 Apr 2007 13:25:37 +0200
From: Dyre.Tjeldvoll@Sun.COM
Subject: Re: Derby process taking up more than 20% of cpu for more than 5hrs
 with only one user connected.
In-reply-to: <loom.20070420T155847-825@post.gmane.org>
Sender: Dyre.Tjeldvoll@Sun.COM
To: Derby Discussion <derby-user@db.apache.org>
Message-id: <x1o3tzv7e3im.fsf@sun.com>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
References: <loom.20070420T155847-825@post.gmane.org>
User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/22.0.50 (usg-unix-v)

Sachin Chitale <sachitale@yahoo.com> writes:

> Hi,
>
> To give you a background of what I was testing:
>
> We have process that uses database as our guranteed message store. To give a 
> stress test to Derby, this program was run to store about 70k+ messages. Then 
> a subscriber was started to fetch the messages. But it seems that the database 
> has stopped responding to the requests.
>
> Initially I was able to get message count using:
> ij>  select count(*) from M_20070129MESSAGES;
> 1
> -----------
> 54598
>
> After ~15 minutes.
> 1 row selected
> ij> select count(*) from M_20070129MESSAGES;
> ERROR 40XL1: A lock could not be obtained within the time requested
> ij> select count(*) from M_20070129MESSAGES;
> ERROR 40XL1: A lock could not be obtained within the time requested
>
> ij> select count(*) from M_20070129MESSAGES;
> ERROR 40XL1: A lock could not be obtained within the time requested
> ij> select * from M_20070129MESSAGES where MESSAGEID='000000000000000000001746'
> ;
> MESSAGEID               |MES&
> -----------------------------
> ERROR 40XL1: A lock could not be obtained within the time requested
>
> When I did prstat on the running derby process it gave the following result:
> -------------------------------------------
>    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
>  27823 schitale  76 1.0  -   -   -   -   21  -   73 127 767   0 java/28
>  27823 schitale 0.5 0.0  -   -   -   -   62  -    2  38  40   0 java/2
>  27823 schitale 0.5 0.0  -   -   -   -   62  -    2  36  38   0 java/4
>  27823 schitale 0.5 0.0  -   -   -   -   62  -    2  37  39   0 java/3
>  27823 schitale 0.5 0.0  -   -   -   -   62  -    2  36  38   0 java/5
>  27823 schitale 0.2 0.0  -   -   -   -   62  -    0   0   0   0 java/11
>  27823 schitale 0.2 0.0  -   -   -   -   62  -    0   0   0   0 java/12
>  27823 schitale 0.1 0.0  -   -   -   -   62  -    1   0   5   0 java/6
>  27823 schitale 0.1 0.0  -   -   -   -   62  -    1   0   8   0 java/27
>  27823 schitale 0.0 0.0  -   -   -   -   63  -    4   0  12   0 java/8
>  27823 schitale 0.0 0.0  -   -   -   -   63  -    5   0  12   0 java/7
>  27823 schitale 0.0 0.0  -   -   -   -   63  -    0   0   0   0 java/1
>  27823 schitale 0.0 0.0  -   -   -   -   63  -    0   0   0   0 java/10
>  27823 schitale 0.0 0.0  -   -   -   -   63  -    0   0   0   0 java/15
>  27823 schitale 0.0 0.0  -   -   -   -  100  -    0   0   0   0 java/14
>   NLWP USERNAME  SIZE   RSS MEMORY      TIME  CPU
>     20 schitale 6610M 5433M    34%   1:59.03  23%
> Total: 1 processes, 20 lwps, load averages: 1.12, 1.12, 1.21
>
> Thread no 28 was taking us most of the CPU time, and 23% was the total CPU 
> cycle share was taken up by derby process.
>
> To find out which java thread was taking time I did a jstack of the process 
> which revealed the following:
>
> jstack 27823
> --------------------------------------------------------------------------
> Thread t@28: (state = IN_JAVA)
>  - java.util.Hashtable.get(java.lang.Object) @bci=45, line=339 (Compiled 
> frame; information may be imprecise)
>  - org.apache.derby.impl.store.raw.data.FileContainer.getUserPage
> (org.apache.derby.impl.store.raw.data.BaseContainerHandle, long, boolean, 
> boolean) @bci=48 (
> Compiled frame)
>  - org.apache.derby.impl.store.access.btree.BTreeScan.reposition
> (org.apache.derby.impl.store.access.btree.BTreeRowPosition, boolean) @bci=114 
> (Compiled frame
> )
>  - org.apache.derby.impl.store.access.btree.BTreeForwardScan.fetchRows
> (org.apache.derby.impl.store.access.btree.BTreeRowPosition, 
> org.apache.derby.iapi.types
> .DataValueDescriptor[][], org.apache.derby.iapi.types.RowLocation[], 
> org.apache.derby.iapi.store.access.BackingStoreHashtable, long, int[]) @bci=31 
> (Compiled
>  frame)
>  - org.apache.derby.impl.store.access.btree.BTreeScan.fetchNext
> (org.apache.derby.iapi.types.DataValueDescriptor[]) @bci=29 (Compiled frame)
>  - org.apache.derby.iapi.store.access.DiskHashtable.get(java.lang.Object) 
> @bci=4 (Interpreted frame)
>  - org.apache.derby.iapi.store.access.BackingStoreHashtable.get
> (java.lang.Object) @bci=27 (Interpreted frame)
>  - org.apache.derby.impl.sql.execute.HashScanResultSet.getNextRowCore() 
> @bci=64 (Interpreted frame)
>  - 
> org.apache.derby.impl.sql.execute.NestedLoopLeftOuterJoinResultSet.getNextRowCo

I'm guessing that you are executing a join that is so large that it
spills to disk. This could take a long time and so hold locks for a
long time. In addition to trying tableLocks (as was already
suggested), you could try to increase the heapSize for your jvm (the
amount of memory used in joins before spilling to disk is a percentage
of the heap size, I think). You could also try to increase the
pageCache size.

-- 
dt