manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subasini Rath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
Date Wed, 20 Feb 2019 09:47:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772827#comment-16772827
] 

Subasini Rath commented on CONNECTORS-1563:
-------------------------------------------

Hi Shinichiro,
   My requirement is not to crawl a file system.
My requirement is to crawl a website. That is the reason I am using Web repository.

** - Also could you please let me know when manifold interacts with Solr , which field does
it write the actual content of document without any metadata.


Didn't get your point on Simple history.
My solr log doesnot show any error.
Manifold log is as follows : 

======
ERROR 2019-02-18T21:19:25,484 (qtp1619356001-411) - Missing resource bundle 'org.apache.manifoldcf.agents.output.solr.common'
for locale 'en': Can't find bundle for base name org.apache.manifoldcf.agents.output.solr.common,
locale en; trying en_US
java.util.MissingResourceException: Can't find bundle for base name org.apache.manifoldcf.agents.output.solr.common,
locale en
	at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1573) ~[?:1.8.0_181]
	at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396) ~[?:1.8.0_181]
	at java.util.ResourceBundle.getBundle(ResourceBundle.java:1091) ~[?:1.8.0_181]
	at org.apache.manifoldcf.core.i18n.Messages.getResourceBundle(Messages.java:142) [mcf-core.jar:?]
	at org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:178) [mcf-core.jar:?]
	at org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:216) [mcf-core.jar:?]
	at org.apache.manifoldcf.agents.output.solr.Messages.getString(Messages.java:91) [mcf-solr-connector.jar:?]
	at org.apache.manifoldcf.agents.output.solr.Messages.getString(Messages.java:39) [mcf-solr-connector.jar:?]
	at org.apache.manifoldcf.agents.output.solr.SolrConnector.outputConfigurationHeader(SolrConnector.java:637)
[mcf-solr-connector.jar:?]
	at org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationHeader(ConnectorFactory.java:71)
[mcf-core.jar:?]
	at org.apache.manifoldcf.agents.interfaces.OutputConnectorFactory.outputConfigurationHeader(OutputConnectorFactory.java:98)
[mcf-agents.jar:?]
	at org.apache.jsp.editoutput_jsp._jspService(editoutput_jsp.java:423) [jsp/:?]
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) [jasper-6.0.35.jar:6.0.35]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
	at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) [jasper-6.0.35.jar:6.0.35]
	at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) [jasper-6.0.35.jar:6.0.35]
	at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) [jasper-6.0.35.jar:6.0.35]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) [jetty-security-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
[jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.Server.handle(Server.java:497) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [jetty-io-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610) [jetty-util-9.2.3.v20140905.jar:9.2.3.v20140905]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539) [jetty-util-9.2.3.v20140905.jar:9.2.3.v20140905]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]





Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: Subasini.Rath@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have
> 0 bytes
> -----------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1563
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
>             Project: ManifoldCF
>          Issue Type: Task
>          Components: Lucene/SOLR connector
>            Reporter: Sneha
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: Document simple history.docx, Manifold and Solr settings_CustomField.docx,
managed-schema, manifold settings.docx, manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an error on
Solr i.e. null:org.apache.solr.common.SolrException: org.apache.tika.exception.ZeroByteFileException:
InputStream must have > 0 bytes
> If I ignore tika exception, my documents get indexed but dont have content field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message