manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject RE: [jira] [Commented] (CONNECTORS-1191) ManifoldCFException: Unexpected job status encountered
Date Sun, 31 May 2015 09:47:44 GMT
Hi Arcadius,

This is consistent with what I have seen before, but the frequency with
which you see this occur is easily two orders of magnitude greater than
for anyone else.

What is happening is the following.
1. The stuffer thread locates documents for processing. It marks these
documents as ACTIVE. For performance reasons, it does not use FOR
UPDATE, but instead uses a global write lock.
2. The thread puts the documents into an internal queue.
3. A worker thread picks up each document off the queue. It processes
the document. At the end of this time, it marks the document as
COMPLETE.

What is happening when it fails is that when it does to mark the
document as COMPLETE, it is finding that it is already in that state.
Obviously this should not happen.

The only thing I can think of is that maybe the other errors you are
seeing somehow cause documents that have already been processed by a
single worker thread to be processed a second time. I will once again
audit that code looking for any possible way that that could occur.

Sent from my Windows Phone
From: Arcadius Ahouansou (JIRA)
Sent: 5/30/2015 9:33 PM
To: dev@manifoldcf.apache.org
Subject: [jira] [Commented] (CONNECTORS-1191) ManifoldCFException:
Unexpected job status encountered

    [ https://issues.apache.org/jira/browse/CONNECTORS-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566286#comment-14566286
]

Arcadius Ahouansou commented on CONNECTORS-1191:
------------------------------------------------

{code}
 WARN 2015-05-29 00:23:25,085 (Worker thread '19') - Server dropped
connection during indexing http://mysite.com/?view=desktop: Broken
pipe
java.net.SocketException: Broken pipe
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
	at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126)
	at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:138)
	at org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:146)
	at org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:194)
	at org.apache.http.entity.mime.content.StringBody.writeTo(StringBody.java:176)
	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
	at org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187)
	at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:123)
	at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:158)
	at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:162)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:462)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
	at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:901)
 WARN 2015-05-29 00:23:25,086 (Worker thread '19') - Service
interruption reported for job 1430004534333 connection 'web': Server
dropped connection during indexing http://mysite.com/?view=x: Broken
pipe
 WARN 2015-05-29 00:36:00,122 (Worker thread '14') - Server dropped
connection during indexing http://mysite.com/?view=y: Broken pipe
java.net.SocketException: Broken pipe
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
	at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126)
	at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:138)
	at org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:146)
	at org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:194)
	at org.apache.http.entity.mime.content.StringBody.writeTo(StringBody.java:176)
	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
	at org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187)
	at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:123)
	at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:158)
	at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:162)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:462)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233)
	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
	at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:901)
 WARN 2015-05-29 00:36:00,123 (Worker thread '14') - Service
interruption reported for job 1430004534323 connection 'web': Server
dropped connection during indexing http://mysite.com/?view=a: Broken
pipe
ERROR 2015-05-29 00:36:19,666 (Worker thread '65') - Exception tossed:
Unexpected jobqueue status - record id 1430006793452, expecting active
status, saw 2
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected
jobqueue status - record id 1430006793452, expecting active status,
saw 2
	at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019)
	at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3260)
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661)
ERROR 2015-05-29 00:36:22,510 (Worker thread '84') - Exception tossed:
Unexpected jobqueue status - record id 1430352284513, expecting active
status, saw 2
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected
jobqueue status - record id 1430352284513, expecting active status,
saw 2
	at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019)
	at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3260)
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661)
ERROR 2015-05-29 00:38:03,439 (Worker thread '2') - Exception tossed:
Unexpected jobqueue status - record id 1430092859936, expecting active
status, saw 2
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected
jobqueue status - record id 1430092859936, expecting active status,
saw 2
	at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019)
	at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3260)
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661)
 WARN 2015-05-29 00:38:03,820 (Worker thread '17') - Service
interruption reported for job 1430004534333 connection 'web': Job no
longer active
 WARN 2015-05-29 00:41:01,378 (Worker thread '11') - Server dropped
connection during indexing http://mysite.com/?view=b: Broken pipe
.....
..
...

	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152)
	at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:901)
 WARN 2015-05-30 09:53:06,147 (Worker thread '51') - Service
interruption reported for job 1430004534328 connection 'web': Server
dropped connection during indexing http://mysite.com/job/?view=az:
Broken pipe
ERROR 2015-05-30 09:53:55,386 (Worker thread '13') - Exception tossed:
Unexpected jobqueue status - record id 1430089523518, expecting active
status, saw 2
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected
jobqueue status - record id 1430089523518, expecting active status,
saw 2
	at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019)
	at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3260)
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661)

{code}

I am still seeing this.

- I have only one instance of MCF running.
- Running against MySQL
- Running against a single instance of Solr
- No Zookeeper installed
- The instance has been started using:
{code}
java -Xmx1024m -Duser.timezone=GMT -Dapp.name=manifoldcf
-Djetty.host=127.0.0.1 -Djetty.send.server.version=false -classpath
/home/manifoldcf/apache-manifoldcf-2.2-dev/example/../lib/*:/home/manifoldcf/apache-manifoldcf-2.2-dev/example/../lib-proprietary/*:
org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner
{code}

As a side note, I have seen a lot of "SELECT FOR UPDATE" in JobManager.java.
Maybe, is it a good idea to review them and use their atomic/lock-free
variant instead?


> ManifoldCFException: Unexpected job status encountered
> ------------------------------------------------------
>
>                 Key: CONNECTORS-1191
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1191
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 2.0.2
>         Environment: - Debian 7.8 x86_64 GNU/Linux
> - Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
> - Server version: 5.5.41-0+wheezy1 (Debian)
>            Reporter: Arcadius Ahouansou
>            Assignee: Karl Wright
>            Priority: Critical
>             Fix For: ManifoldCF 1.9, ManifoldCF 2.1
>
>         Attachments: CONNECTORS-1191.patch, manifoldcf2.0.2.log
>
>
> Hello.
> I am running the latest ManifoldCF 2.0.2 and my log is filled of
> {code}
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job status encountered:
34
> 	at org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2073)
> 	at org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8261)
> 	at org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:91)
> {code}
> I have attached full log for more detail.
> Note that I am running against MySQL.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message