manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamil Żyta <kamil.z...@pwr.edu.pl>
Subject Re: ElastiSearch missing doc
Date Fri, 12 Dec 2014 14:41:42 GMT
On Fri, Dec 12, 2014 at 09:14:40AM -0500, Karl Wright wrote:
> Hi Kamil,
> 
> You are getting a 404 error when ManifoldCF tries to delete a document from
> the ElasticSearch index:
> 
> >>>>>>
>     else if (code == 404)
>     {
>       setResult(IOutputHistoryActivity.HTTP_ERROR,Result.ERROR, "Page not
> found: " + response);
>       throw new ManifoldCFException("Server/page not found");
>     }
> <<<<<<
> 
> The URL it is using is constructed as follows:
> 
> >>>>>>
>       String idField = URLEncoder.encode(documentURI);
>       HttpDelete method = new HttpDelete(config.getServerLocation() +
>           "/" + config.getIndexName() + "/" + config.getIndexType()
>           + "/" + idField);
>       call(method);
> <<<<<<
> 
> So there are a number of possibilities.  First possibility is that ES was
> down entirely when this job ended, and so document removal requests failed
> for a legitimate reason.  Second, it may be that the document in question
> has already been deleted, and while this would formerly return a 200 error
> code in the version of ES the connector was written for, it now returns a
> 404.  Finally, maybe the REST API changed so much that it is no longer
> possible to delete a document from the index this way.  What version of
> ElasticSearch are you using, and can you find REST API documentation for
> that version that you could point me at?  Can you do enough research to
> find out what should work here?
> 

  "version" : {
    "number" : "1.4.1",
    "build_hash" : "89d3241d670db65f994242c8e8383b169779e2d4",
    "build_timestamp" : "2014-11-26T15:49:29Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.2"
  },

url for deleting is correct: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete.html
and I found this: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/delete-doc.html

"If the document isn’t found, we get a 404 Not Found response code and a body like (...)"

K

> 
> 
> 
> On Fri, Dec 12, 2014 at 8:56 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl> wrote:
> >
> > Hi,
> > When I testing ES as indexer some job ends with 'Error: Server/page not
> > found'. In ES log I have
> > some too big doc exceptions. How this affect job? Full MCF logs:
> >
> > ERROR 2014-12-12 14:45:24,915 (Document cleanup thread '2') - Exception
> > tossed: Server/page not found
> > org.apache.manifoldcf.core.interfaces.ManifoldCFException: Server/page not
> > found
> >         at
> > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.handleResultCode(ElasticSearchConnection.java:234)
> >         at
> > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.call(ElasticSearchConnection.java:203)
> >         at
> > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchDelete.execute(ElasticSearchDelete.java:45)
> >         at
> > org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.removeDocument(ElasticSearchConnector.java:578)
> >         at
> > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2350)
> >         at
> > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1059)
> >         at
> > org.apache.manifoldcf.crawler.system.DocumentCleanupThread.run(DocumentCleanupThread.java:189)
> >
> > Thanks,
> > Kamil
> >

Mime
View raw message