manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shigeki Kobayashi <shigeki.kobayas...@g.softbank.co.jp>
Subject Web crawling causes Socket Timeout after Database Exception
Date Wed, 10 Oct 2012 07:51:13 GMT
Hi

I am having a trouble with crawling web using MCF1.0.
I run MCF with MySQL 5.5 and Tomcat 6.0.
It should keep crawling contents, but MCF prints the following Database
exception log, then hangs.
After DB Exception, Socket Time Exception occurs.

Anyone has faced this problem?

--Database Exception log:

ERROR 2012-10-10 16:11:05,787 (Worker thread '42') - Worker thread aborting
and restarting due to database connection reset: Database exception:
Exception doing query: Lock wait timeout exceeded; try restarting
transaction
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
exception: Exception doing query: Lock wait timeout exceeded; try
restarting transaction
        at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:681)
        at
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:709)
        at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1394)
        at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
        at
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
        at
org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:852)
        at
org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4089)
        at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:1932)
        at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.addDocumentReference(WorkerThread.java:1487)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessActivityLinkHandler.noteDiscoveredLink(WebcrawlerConnector.java:6049)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector$ProcessAcivityHTMLHandler.noteAHREF(WebcrawlerConnector.java:6159)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.LinkParseState.noteNonscriptTag(LinkParseState.java:44)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.FormParseState.noteNonscriptTag(FormParseState.java:52)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.ScriptParseState.noteTag(ScriptParseState.java:50)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.BasicParseState.dealWithCharacter(BasicParseState.java:225)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleHTML(WebcrawlerConnector.java:7047)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:6011)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:1282)
        at
org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
        at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try
restarting transaction
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624)
        at
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2127)
        at
com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2293)
        at
org.apache.manifoldcf.core.database.Database.execute(Database.java:826)
        at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:641)
ERROR 2012-10-10 16:11:06,799 (Worker thread '9') - Worker thread aborting
and restarting due to database connection reset: Database exception:
Exception doing query: Lock wait timeout exceeded; try restarting
transaction
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
exception: Exception doing query: Lock wait timeout exceeded; try
restarting transaction
        at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:681)
        at
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:709)
        at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1394)
        at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
        at
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
        at
org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:852)
        at
org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4089)
        at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:1932)
        at
org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1863)
        at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:554)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try
restarting transaction
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624)
        at
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2127)
        at
com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2293)
        at
org.apache.manifoldcf.core.database.Database.execute(Database.java:826)
        at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:641)



---- Socket Timeout:


DEBUG 2012-10-10 16:16:27,256 (Worker thread '49') - Socket timeout
exception trying to close connection: Read timed out
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at
org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown Source)
        at
org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown Source)
        at
org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(Unknown
Source)
        at
org.apache.commons.httpclient.ContentLengthInputStream.close(Unknown Source)
        at java.io.FilterInputStream.close(FilterInputStream.java:155)
        at
org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(Unknown
Source)
        at org.apache.commons.httpclient.AutoCloseInputStream.close(Unknown
Source)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.close(ThrottledFetcher.java:2082)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache.addData(DataCache.java:176)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.getDocumentVersions(WebcrawlerConnector.java:745)
        at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:321)
 INFO 2012-10-10 16:16:27,273 (Worker thread '49') - WEB: FETCH URL|
http://xxxxxx/...|1349852786744+600514|-104|4125|org.apache.manifoldcf.core.interfaces.ManifoldCFException|<http://xxxxxx/...%7C1349852786744+600514%7C-104%7C4125%7Corg.apache.manifoldcf.core.interfaces.ManifoldCFException%7C>Interrupted:
Socket timeout: Read timed out
DEBUG 2012-10-10 16:16:27,273 (Worker thread '49') - WEB: Fetch exception
for 'http://xxxxxx/...'
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Interrupted:
Socket timeout: Read timed out
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledConnection.noteInterrupted(ThrottledFetcher.java:1818)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.getDocumentVersions(WebcrawlerConnector.java:797)
        at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:321)
Caused by: org.apache.manifoldcf.agents.interfaces.ServiceInterruption:
Socket timeout: Read timed out
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache.addData(DataCache.java:101)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.getDocumentVersions(WebcrawlerConnector.java:745)
        ... 1 more
Caused by: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at
org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown Source)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown
Source)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.basicRead(ThrottledFetcher.java:2012)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.read(ThrottledFetcher.java:1976)
        at
org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache.addData(DataCache.java:95)
        ... 2 more
 WARN 2012-10-10 16:16:27,274 (Worker thread '49') - Pre-ingest service
interruption reported for job 1349774325961 connection 'WEB': Socket
timeout: Read timed out



Regards,

Shigeki

Mime
View raw message