manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Diagnosing "REJECTED" documents in job history
Date Wed, 30 Jan 2013 14:03:14 GMT
Ok, so let's back up a bit.

First, which version of ManifoldCF is this?  I need to know that
before I can interpret the stack trace.

Second, what do you see when you view the connection in the crawler
UI?  Does it say "Connection working", or something else, and if so,

I've created a ticket for better error reporting in this connector -
it was a contribution and AFAIK the error handling is not very robust
at this point, but I can fix that quickly with your help. ;-)


On Wed, Jan 30, 2013 at 8:55 AM, Andrew Clegg <> wrote:
> On 30 January 2013 13:33, Karl Wright <> wrote:
>> So you saw events in the history which correspond to these documents
>> and which are of type "Indexation" that say "success"?  If that is the
>> case, then the ElasticSearch connector thinks it handed the documents
>> successfully to the ElasticSearch server.
> Ah, no, the activity is fetch rather than indexation. e.g.
> 01-30-2013 13:08:16.217 fetch 09026205800698a9 Success 549541 361
> I don't see any history entries relating to indexing as a specific
> activity in its own right. Sorry, that was probably a red herring, I
> don't think it's getting that far.
> I just noticed that above all the "service interruption reported"
> warnings are some errors like this:
> ERROR 2013-01-30 13:44:15,356 (Worker thread '45') - Exception tossed:
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>         at
>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex.<init>(
>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.addOrReplaceDocument(
>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(
>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(
>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(
>         at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(
>         at org.apache.manifoldcf.crawler.connectors.DCTM.DCTM.processDocuments(
>         at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(
>         at
> Sadly there's no description, just a stacktrace.
> I know the ES server is visible from the MCF server -- actually
> they're the same machine, and it's configured to use
> as the server URL. And I can go to the command
> line on that server and curl that URL successfully.

View raw message