manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Steenbeke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass
Date Mon, 07 Jan 2019 09:01:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735590#comment-16735590
] 

Tim Steenbeke commented on CONNECTORS-1562:
-------------------------------------------

I have the Throttle on null and max_connections on 10 which was the standard setting.

 

I'm also getting an Error when i try to open my web output-connector all other connectors
and job editing works.
I'm building the manifold connectors and jobs using the API.
*HTTP ERROR 500*
Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason:
    Server Error

*Caused by:*
{code:java}
org.apache.jasper.JasperException: An exception occurred processing JSP page /editconnection.jsp
at line 564

561:
562:     if (className.length() > 0)
563:     {
564:       RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new
org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName);
565:     }
566: %>
567:


Stacktrace:
    at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
    at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430)
    at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
    at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
    at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164)
    at org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.<init>(KeystoreManager.java:86)
    at org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47)
    at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701)
    at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(WebcrawlerConnector.java:1866)
    at org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationBody(ConnectorFactory.java:83)
    at org.apache.manifoldcf.crawler.interfaces.RepositoryConnectorFactory.outputConfigurationBody(RepositoryConnectorFactory.java:155)
    at org.apache.jsp.editconnection_jsp._jspService(editconnection_jsp.java:916)
    at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)
    ... 23 more
   {code}
*Caused by:*

 
{code:java}
java.lang.NullPointerException
    at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164)
    at org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.<init>(KeystoreManager.java:86)
    at org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47)
    at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701)
    at org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(WebcrawlerConnector.java:1866)
    at org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationBody(ConnectorFactory.java:83)
    at org.apache.manifoldcf.crawler.interfaces.RepositoryConnectorFactory.outputConfigurationBody(RepositoryConnectorFactory.java:155)
    at org.apache.jsp.editconnection_jsp._jspService(editconnection_jsp.java:916)
    at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388)
    at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
    at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
    at org.eclipse.jetty.server.Server.handle(Server.java:497)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
    at java.lang.Thread.run(Thread.java:748)
{code}
 

> Documents unreachable due to hopcount are not considered unreachable on cleanup pass
> ------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1562
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Elastic Search connector, Web connector
>    Affects Versions: ManifoldCF 2.11
>         Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>            Reporter: Tim Steenbeke
>            Assignee: Karl Wright
>            Priority: Critical
>              Labels: starter
>             Fix For: ManifoldCF 2.12
>
>         Attachments: Screenshot from 2018-12-31 11-17-29.png, manifoldcf.log.cleanup,
manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to keep it runneng
even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message