Have you used the DRCP settings in DBD::Oracle?
http://search.cpan.org/~pythian/DBD-Oracle-1.66/lib/DBD/Oracle.pm#Oracle_DRCP

From the description of DRCP, it sounds like you should never lose the connection.  You may want to check the configuration.

From your debugging output, it looks like the line that tries to re-connect is failing.  I'd add some debugging statements to confirm that, and then try writing a simple script outside of mod_perl that tries to re-connect to Oracle after you drop the connection from the Oracle side.  To figure out more, you'll probably need to talk to whoever maintains DBD::Oracle.

- Perrin


On Wed, Nov 6, 2013 at 8:55 PM, Xinhuan Zheng <xzheng@christianbook.com> wrote:
Hi Perrin,

I am using Oracle Resident Connection Pool feature. The application randomly got "ORA-03114 not connect to oracle database" error. As per this document:

http://search.cpan.org/~phred/Apache-DBI-1.12/lib/Apache/DBI.pm

I have changed the DBD::Oracle to use the following ping instead of ora_ping:

sub ping {
my ($dbh) = @_;
my $ret = 0;
eval {
local $SIG{__DIE__} = sub { return (0); };
local $SIG{__WARN__} = sub { return (0); };
# adapt the select statement to your database:
$ret = $dbh->do('select 1');
};
return ($@) ? 0 : $ret;
}

After this change, the application can detect "end-of-file communication" error then after the application establishes new connection, the application still throws "ORA-03114 not connect to oracle database" error. 

below is the errors when Apache::DBI debug is set:

3861 Apache::DBI             need ping: yes
DBD::Oracle::db ping failed: ORA-03113: end-of-file on communication channel
Process ID: 29671
Session ID: 3219 Serial number: 544 (DBD ERROR: OCIStmtExecute/Describe) at /usr/local/lib/perl5/site_perl/5.8.3/Apache/DBI.pm line 200.
3861 Apache::DBI             new connect to '...'
3861 Apache::DBI             need ping: yes
3861 Apache::DBI             already connected to '...'
3861 Apache::DBI             need ping: yes
3861 Apache::DBI             already connected to '...'
3861 Apache::DBI             need ping: yes
3861 Apache::DBI             already connected to '...'
3861 Apache::DBI             need ping: yes
3861 Apache::DBI             already connected to '...'
3861 Apache::DBI             need ping: yes
3861 Apache::DBI             already connected to '...'
3861 Apache::DBI             need ping: yes
3861 Apache::DBI             already connected to '...'
3861 Apache::DBI             need ping: yes
DBD::Oracle::st execute failed: ORA-03114: not connected to ORACLE (DBD ERROR: OCIStmtExecute)
3861 Apache::DBI             need ping: yes
3861 Apache::DBI             already connected to ''
3861 Apache::DBI             disconnect (overloaded)
[Wed Nov  6 16:49:50 2013] [crit] ORA-03114: not connected to ORACLE (DBD ERROR: OCIStmtExecute).

Process 3861 establishes a new connection. and it says "already connected to" database. While the query execution failed with "ORA-03114" and the httpd process is in CLOSE_WAIT state. How does it thinks it's already connected while it's actually not?

  • xinhuan
------------------------------------------------------------------------------------------------------------------------------------------------------------

From:  Perrin Harkins <pharkins@gmail.com>
Date:  Wednesday, November 6, 2013 5:54 PM
To:  Xinhuan Zheng <xzheng@christianbook.com>
Cc:  mod_perl list <modperl@perl.apache.org>
Subject:  Re: Apache::DBI connect



On Wed, Nov 6, 2013 at 4:07 PM, Xinhuan Zheng

I wonder this line of code should be changed to 'DBI->connect(@args) if ($@)'. If ping failed, that means the connection is already closed, $drh may be no longer valid, will $drh->connect always return a valid new connection? What's the difference between DBI->connect
vs $drh->connect?



The $drh there is a DBI driver, not a database handle.  It should return a new connection every time.  You can try putting in DBI->connect() there and see if it fixes the problem for you though.  If not, I'd suggest reporting the problem as a DBD::Oracle issue.

By the way, which Oracle pooling solution are you using and how is it working out for you, aside from this problem?

- Perrin