Return-Path: X-Original-To: apmail-manifoldcf-dev-archive@www.apache.org Delivered-To: apmail-manifoldcf-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D050179CF for ; Sun, 31 May 2015 09:50:18 +0000 (UTC) Received: (qmail 24870 invoked by uid 500); 31 May 2015 09:50:17 -0000 Delivered-To: apmail-manifoldcf-dev-archive@manifoldcf.apache.org Received: (qmail 24826 invoked by uid 500); 31 May 2015 09:50:17 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 24727 invoked by uid 99); 31 May 2015 09:50:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 31 May 2015 09:50:17 +0000 Date: Sun, 31 May 2015 09:50:17 +0000 (UTC) From: "Karl Wright (JIRA)" To: dev@manifoldcf.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CONNECTORS-1191) ManifoldCFException: Unexpected job status encountered MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CONNECTORS-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566485#comment-14566485 ] Karl Wright commented on CONNECTORS-1191: ----------------------------------------- Hi Arcadius, This is consistent with what I have seen before, but the frequency with which you see this occur is easily two orders of magnitude greater than for anyone else. What is happening is the following. 1. The stuffer thread locates documents for processing. It marks these documents as ACTIVE. For performance reasons, it does not use FOR UPDATE, but instead uses a global write lock. 2. The thread puts the documents into an internal queue. 3. A worker thread picks up each document off the queue. It processes the document. At the end of this time, it marks the document as COMPLETE. What is happening when it fails is that when it does to mark the document as COMPLETE, it is finding that it is already in that state. Obviously this should not happen. The only thing I can think of is that maybe the other errors you are seeing somehow cause documents that have already been processed by a single worker thread to be processed a second time. I will once again audit that code looking for any possible way that that could occur. Sent from my Windows Phone From: Arcadius Ahouansou (JIRA) Sent: 5/30/2015 9:33 PM To: dev@manifoldcf.apache.org Subject: [jira] [Commented] (CONNECTORS-1191) ManifoldCFException: Unexpected job status encountered [ https://issues.apache.org/jira/browse/CONNECTORS-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566286#comment-14566286 ] Arcadius Ahouansou commented on CONNECTORS-1191: ------------------------------------------------ {code} WARN 2015-05-29 00:23:25,085 (Worker thread '19') - Server dropped connection during indexing http://mysite.com/?view=desktop: Broken pipe java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126) at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:138) at org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:146) at org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:194) at org.apache.http.entity.mime.content.StringBody.writeTo(StringBody.java:176) at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211) at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229) at org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187) at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:123) at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:158) at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:162) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:462) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:901) WARN 2015-05-29 00:23:25,086 (Worker thread '19') - Service interruption reported for job 1430004534333 connection 'web': Server dropped connection during indexing http://mysite.com/?view=x: Broken pipe WARN 2015-05-29 00:36:00,122 (Worker thread '14') - Server dropped connection during indexing http://mysite.com/?view=y: Broken pipe java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126) at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:138) at org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:146) at org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:194) at org.apache.http.entity.mime.content.StringBody.writeTo(StringBody.java:176) at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211) at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229) at org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187) at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:123) at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:158) at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:162) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:462) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:901) WARN 2015-05-29 00:36:00,123 (Worker thread '14') - Service interruption reported for job 1430004534323 connection 'web': Server dropped connection during indexing http://mysite.com/?view=a: Broken pipe ERROR 2015-05-29 00:36:19,666 (Worker thread '65') - Exception tossed: Unexpected jobqueue status - record id 1430006793452, expecting active status, saw 2 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected jobqueue status - record id 1430006793452, expecting active status, saw 2 at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019) at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3260) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661) ERROR 2015-05-29 00:36:22,510 (Worker thread '84') - Exception tossed: Unexpected jobqueue status - record id 1430352284513, expecting active status, saw 2 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected jobqueue status - record id 1430352284513, expecting active status, saw 2 at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019) at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3260) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661) ERROR 2015-05-29 00:38:03,439 (Worker thread '2') - Exception tossed: Unexpected jobqueue status - record id 1430092859936, expecting active status, saw 2 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected jobqueue status - record id 1430092859936, expecting active status, saw 2 at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019) at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3260) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661) WARN 2015-05-29 00:38:03,820 (Worker thread '17') - Service interruption reported for job 1430004534333 connection 'web': Job no longer active WARN 2015-05-29 00:41:01,378 (Worker thread '11') - Server dropped connection during indexing http://mysite.com/?view=b: Broken pipe ..... .. ... at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:152) at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:901) WARN 2015-05-30 09:53:06,147 (Worker thread '51') - Service interruption reported for job 1430004534328 connection 'web': Server dropped connection during indexing http://mysite.com/job/?view=az: Broken pipe ERROR 2015-05-30 09:53:55,386 (Worker thread '13') - Exception tossed: Unexpected jobqueue status - record id 1430089523518, expecting active status, saw 2 org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected jobqueue status - record id 1430089523518, expecting active status, saw 2 at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019) at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3260) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661) {code} I am still seeing this. - I have only one instance of MCF running. - Running against MySQL - Running against a single instance of Solr - No Zookeeper installed - The instance has been started using: {code} java -Xmx1024m -Duser.timezone=GMT -Dapp.name=manifoldcf -Djetty.host=127.0.0.1 -Djetty.send.server.version=false -classpath /home/manifoldcf/apache-manifoldcf-2.2-dev/example/../lib/*:/home/manifoldcf/apache-manifoldcf-2.2-dev/example/../lib-proprietary/*: org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner {code} As a side note, I have seen a lot of "SELECT FOR UPDATE" in JobManager.java. Maybe, is it a good idea to review them and use their atomic/lock-free variant instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) > ManifoldCFException: Unexpected job status encountered > ------------------------------------------------------ > > Key: CONNECTORS-1191 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1191 > Project: ManifoldCF > Issue Type: Bug > Components: Web connector > Affects Versions: ManifoldCF 2.0.2 > Environment: - Debian 7.8 x86_64 GNU/Linux > - Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode) > - Server version: 5.5.41-0+wheezy1 (Debian) > Reporter: Arcadius Ahouansou > Assignee: Karl Wright > Priority: Critical > Fix For: ManifoldCF 1.9, ManifoldCF 2.1 > > Attachments: CONNECTORS-1191.patch, manifoldcf2.0.2.log > > > Hello. > I am running the latest ManifoldCF 2.0.2 and my log is filled of > {code} > org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected job status encountered: 34 > at org.apache.manifoldcf.crawler.jobs.Jobs.returnJobToActive(Jobs.java:2073) > at org.apache.manifoldcf.crawler.jobs.JobManager.resetJobs(JobManager.java:8261) > at org.apache.manifoldcf.crawler.system.JobResetThread.run(JobResetThread.java:91) > {code} > I have attached full log for more detail. > Note that I am running against MySQL. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)