manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Subasini Rath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
Date Tue, 15 Jan 2019 11:41:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742999#comment-16742999
] 

Subasini Rath commented on CONNECTORS-1563:
-------------------------------------------

Hi Karl,
   Hope you have seen the document I had sent you in email with screenshots. 
Yes, I did the following steps : 

1.In output connection, I clicked the button [Reset All associated Records] and started indexing
from the beginning.
2. In path tab, I changed update handler to /update (default was (/update/extract)
3. In schema tab, unchecked the check box [Use the Extract Update Handler]. It forced me to
give document length and content field name
4. copied the existing job and created a new one with same configuration
5. Run the new job.
6. In manifold,I can see website is getting crawled and documents are getting processed but
nothing was appearing in Solr index.

Please guide.






Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: Subasini.Rath@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have
> 0 bytes
> -----------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1563
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
>             Project: ManifoldCF
>          Issue Type: Task
>          Components: Lucene/SOLR connector
>            Reporter: Sneha
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an error on
Solr i.e. null:org.apache.solr.common.SolrException: org.apache.tika.exception.ZeroByteFileException:
InputStream must have > 0 bytes
> If I ignore tika exception, my documents get indexed but dont have content field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message