cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject DO NOT REPLY [Bug 35595] New: - SQLTransformer makes Cocoon hang under high load
Date Mon, 04 Jul 2005 07:38:17 GMT

           Summary: SQLTransformer makes Cocoon hang under high load
           Product: Cocoon 2
           Version: 2.1.7
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: critical
          Priority: P2
         Component: blocks

Overview Description:

Concurrent requests to pages containing nested queries cause
the SQLTransformer to acquire all available connections
from the database pool, making them wait forever for further
connections to become available.
The connections are never returned to the pool, resulting in
a hang of the webapp (deadlock) until Cocoon is restarted.

Steps to Reproduce:

1) Build a default 2.1.7 Cocoon distribution.

2) Add your favorite database driver to web.xml and define
   a 'small' database connection pool in cocoon.xconf, for
   example:  <pool-controller min="2" max="5"/>
3) Create a 'hang.xml' file with nested queries to feed through
   the SQLTransformer. A dummy query which returns one row
   (like 'SELECT 1') will do just fine! Example:
    <?xml version="1.0"?>
    <sql xmlns:sql="">
            <sql:query name="query1">SELECT 1</sql:query>
                <sql:query name="query2">SELECT 2</sql:query>
                   <sql:query name="query3">SELECT 3</sql:query>

4) Define a pipeline in the root sitemap which will run the
   above file through the SQLTransformer:
     <map:match pattern="hang">
        <map:generate src="hang.xml"/>
        <map:transform type="sql">
          <map:parameter name="use-connection" value="dbpool"/>
        <map:serialize type="text"/>    
   Don't forget to test it!

   Requesting the url 'http://localhost:8888/hang' with a web-
   browser should return: '123'

5) Now run a http benchmarking tool (ab for instance) with multiple
   concurrent requests against the above page, for example:
     ab -n 100 -c 5 http://localhost:8888/hang
   This should launch 5 concurrent, and 100 total, requests.
   See what happens! (Timeout?)
   Try to reload the http://localhost:8888/hang url with your
   web-browser, and see what happens! (Timeout?)  
   Wait a few minutes, a few hours or a few days, try to access
   the above url again, and see what happens! (Timeout?)

Actual Results:

   Any subsequent request to the above url, or more generally
   to a page which requires a database connection from the pool
   ('dbpool' in the example) will hang forever!

   In a dynamic webapp, where every page request requires accessing
   the database, the above situation will result in a hang of your
   whole webapp ...until you restart Cocoon.

Expected Results:

   123  ;-)

   It is acceptable for the webapp to respond slowly, or to timeout/
   complain about requests which cannot be handled under heavy load!

   But the webapp should not hang forever, requiring a restart of

Build Date & Platform:

   I reproduced the behaviour with the current Cocoon 2.1.7 release
   on both WinXP and Linux RedHat, running Java 1.4.2, and with
   PostgreSQL 7.4.6 and Oracle9i as database backends.
   Given the diversity of OSes and DBs, it really doesn't smell
   like a platform-related problem!
Additional Information:

   Included is the post I made to the list
   prior to writing this bug report.
   It describes the problem in more detail, a brutal workaround,
   and a possible explanation of what is happening:

Hello again,

The deadlock situation you describe is a plausible explanation for the
experienced Cocoon 'hangs'!

I took a deep breath and had a look into, and I stumbled
upon the following code snippet in the executeQuery() method:

    try {
        if (index == 0) {
            if (this.conn == null) // The first top level execute-query
                this.conn = query.getConnection();
            // reuse the global connection for all top level queries
            conn = this.conn;
        else // index > 0, sub queries are always executed in an own connection
            conn = query.getConnection();

    } ...

Effectively, a new connection is being requested for every nested subquery!

What perplexes me, is that the getConnection() method appears to go to quite
some lengths to ensure that it won't block if it can't acquire a connection
after waiting/retrying a few times.
Normally it should complain (ie. write to the logs) and abort, but that never

I am going to post this as a bug report.

Thanks for the prompt response,

>> Hello,
>> I'm developing a dynamic Cocoon webapp that performs a lot of (nested)
>> queries using the SQLTransformer.
>> The SQLTransformer performs SELECT operations only, and it is configured to
>> use a JDBC database connection pool that is managed by Cocoon itself.
>> Excerpt from cocoon.xconf:
>>   <datasources>
>>     <jdbc logger="core.datasources.db" name="db">
>>       <pool-controller min="5" max="50"/>
>>       ...
>>     </jdbc>
>>   </datasources>
>> At some point, I decided to stress-test the webapp using the 'ab'-tool
>> (Apache benchmark) to simulate a large number of concurrent requests.
>> When the number of concurrent requests is heading towards the 'max'
>> number of pooled database connections, the following occurs:
>> - The 'ab' benchmark is aborted with a timeout error.
>> - Using a database monitoring tool, you will notice that the 'max' number of
>> pooled database connections have been left open.
>> - Cocoon 'hangs' indefinitely!
>> Any further requests to the webapp will timeout, and nothing gets written to
>> the logs (access.log, error.log...)
>> There are no warnings, nor any error messages in the logs that would
>> indicate why the hang occured in the first place. Up to the hang, everything
>> appears to run normally!
>> That's scary!
>> Currently, I work around the problem with a big axe! That is, by limiting the
>> number of socket listener threads of the servlet container, and by defining a
>> rather oversized pool of database connections.
>> Excerpt from Jetty's main.xml configuration file:
>>   <Call name="addListener">
>>     <Arg>
>>       <New class="org.mortbay.http.SocketListener">
>>         <Set name="MinThreads">8</Set>
>>         <Set name="MaxThreads">16</Set>
>>         ...
>>       </New>
>>     </Arg>
>>   </Call>
>> My best guess is that the SQLTransformer will wait indefinitely (causing
>> Cocoon to hang) if it fails to get a database connection from the pool.
>> The SQLTransformer further appears to require more than one database
>> connection (...which seems to be related to the depth of query nesting levels,
>> but that's a far-fetched guess!)
>> Has someone out there experienced similar problems, or have a more
>> enlightening  explanation to what is going on?
>> As my fix is based on a lot of guesswork, I would welcome a more robust solution.
>> Best regards,
>> Fabrizio
> It sounds like SQLTransformer naively gets a new connection per query.
> This in turn requires that if your page has 4 nested queries then
> SQLTransformer is using 4 of your 50 connections.  It only takes 13
> simultaneous requests to use up all the connections at that rate, and if
> all your requests have anywhere between 1 and 3 connections reserved and
> are not finished yet, none of the pages will release the connection and
> a deadlock occurs.  If you have as many as 10 nested queries it is
> obvious to see how something like this could occur.
> I think the ESQL logicsheet suffered from this many moons ago, but
> resolved it by requesting one connection per page.  BTW, Microsoft's
> feeble driver is limited to one connection per query--so your choice is
> to use jTDS or some other company's version.
> Nonetheless, it seems like there is a bug in SQLTransformer.  Submit a
> bug report, and hopefully it can get resolved.
> -- 
> Design is a funny word. Some people think design means how it looks.
> But of course, if you dig deeper, it's really how it works.
>                                                         -- Steve Jobs

Configure bugmail:
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

View raw message