manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ElastiSearch missing doc
Date Fri, 12 Dec 2014 14:55:41 GMT
I've created CONNECTORS-1120 for this fix.  I should have something to try
shortly.

Karl


On Fri, Dec 12, 2014 at 9:41 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl> wrote:
>
> On Fri, Dec 12, 2014 at 09:14:40AM -0500, Karl Wright wrote:
> > Hi Kamil,
> >
> > You are getting a 404 error when ManifoldCF tries to delete a document
> from
> > the ElasticSearch index:
> >
> > >>>>>>
> >     else if (code == 404)
> >     {
> >       setResult(IOutputHistoryActivity.HTTP_ERROR,Result.ERROR, "Page not
> > found: " + response);
> >       throw new ManifoldCFException("Server/page not found");
> >     }
> > <<<<<<
> >
> > The URL it is using is constructed as follows:
> >
> > >>>>>>
> >       String idField = URLEncoder.encode(documentURI);
> >       HttpDelete method = new HttpDelete(config.getServerLocation() +
> >           "/" + config.getIndexName() + "/" + config.getIndexType()
> >           + "/" + idField);
> >       call(method);
> > <<<<<<
> >
> > So there are a number of possibilities.  First possibility is that ES was
> > down entirely when this job ended, and so document removal requests
> failed
> > for a legitimate reason.  Second, it may be that the document in question
> > has already been deleted, and while this would formerly return a 200
> error
> > code in the version of ES the connector was written for, it now returns a
> > 404.  Finally, maybe the REST API changed so much that it is no longer
> > possible to delete a document from the index this way.  What version of
> > ElasticSearch are you using, and can you find REST API documentation for
> > that version that you could point me at?  Can you do enough research to
> > find out what should work here?
> >
>
>   "version" : {
>     "number" : "1.4.1",
>     "build_hash" : "89d3241d670db65f994242c8e8383b169779e2d4",
>     "build_timestamp" : "2014-11-26T15:49:29Z",
>     "build_snapshot" : false,
>     "lucene_version" : "4.10.2"
>   },
>
> url for deleting is correct:
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete.html
> and I found this:
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/delete-doc.html
>
> "If the document isn’t found, we get a 404 Not Found response code and a
> body like (...)"
>
> K
>
> >
> >
> >
> > On Fri, Dec 12, 2014 at 8:56 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
> wrote:
> > >
> > > Hi,
> > > When I testing ES as indexer some job ends with 'Error: Server/page not
> > > found'. In ES log I have
> > > some too big doc exceptions. How this affect job? Full MCF logs:
> > >
> > > ERROR 2014-12-12 14:45:24,915 (Document cleanup thread '2') - Exception
> > > tossed: Server/page not found
> > > org.apache.manifoldcf.core.interfaces.ManifoldCFException: Server/page
> not
> > > found
> > >         at
> > >
> org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.handleResultCode(ElasticSearchConnection.java:234)
> > >         at
> > >
> org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.call(ElasticSearchConnection.java:203)
> > >         at
> > >
> org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchDelete.execute(ElasticSearchDelete.java:45)
> > >         at
> > >
> org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.removeDocument(ElasticSearchConnector.java:578)
> > >         at
> > >
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2350)
> > >         at
> > >
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1059)
> > >         at
> > >
> org.apache.manifoldcf.crawler.system.DocumentCleanupThread.run(DocumentCleanupThread.java:189)
> > >
> > > Thanks,
> > > Kamil
> > >
>

Mime
View raw message