lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdelhamid ABID <>
Subject Solrj doesn't tell if PDF was actually parsed by Tika
Date Thu, 25 Mar 2010 14:21:40 GMT
When posting pdf files using solrj the only response we get from Solr is
only server response status, but never know whether
pdf was actually parsed or not, checking the log I found that some Tika
wasn't able
to succeed with some pdf files because of content nature (texts in images
only) or are corrupted:

     25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine
     INFO: unsupported/disabled operation: EI

     25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode
     GRAVE: Stop reading corrupt stream

The question is how can I catch these kinds of exceptions through Solrj ?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message