lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mareike Glock <mareike.gl...@Student.HTW-Berlin.de>
Subject Problem with SolrJ and indexing PDF files
Date Sun, 19 May 2019 09:02:16 GMT
Dear Solr Team,

I am trying to index Word and PDF documents with Solr using SolrJ, but 
most of the examples I found on the internet use the SolrServer class 
which I guess is deprecated.
The connection to Solr itself is working, because I can add 
SolrInputDocuments to the index but it does not work for rich documents 
because I get an exception.


public static void main(String[] args) throws IOException, 
SolrServerException {
         String urlString = "http://localhost:8983/solr/localDocs16";
         HttpSolrClient solr = new 
HttpSolrClient.Builder(urlString).build();

         //is working
         for(int i=0;i<1000;++i) {
             SolrInputDocument doc = new SolrInputDocument();
             doc.addField("cat", "book");
             doc.addField("id", "book-" + i);
             doc.addField("name", "The Legend of the Hobbit part " + i);
             solr.add(doc);
             if(i%100==0) solr.commit();  // periodically flush
         }

         //is not working
         File file = new File("path\\testfile.pdf");

         ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest("update/extract");

         req.addFile(file, "application/pdf");
         req.setParam("literal.id", "doc1");
         req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
         try{
             solr.request(req);
         }
         catch(IOException e){
             PrintWriter out = new 
PrintWriter("C:\\Users\\mareike\\Desktop\\filename.txt");
             e.printStackTrace(out);
             out.close();
             System.out.println("IO message: " + e.getMessage());
         } catch(SolrServerException e){
             PrintWriter out = new 
PrintWriter("C:\\Users\\mareike\\Desktop\\filename.txt");
             e.printStackTrace(out);
             out.close();
             System.out.println("SolrServer message: " + e.getMessage());
         } catch(Exception e){
             PrintWriter out = new 
PrintWriter("C:\\Users\\mareike\\Desktop\\filename.txt");
             e.printStackTrace(out);
             out.close();
             System.out.println("UnknownException message: " + 
e.getMessage());
         }finally{
             solr.commit();
         }
}


I am using Maven (pom.xml attached) and created a JAR file, which I then 
tried to execute from the command line, and this is the output I get:

     SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
     SLF4J: Defaulting to no-operation (NOP) logger implementation
     SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
     SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
     SLF4J: Defaulting to no-operation MDCAdapter implementation.
     SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for 
further details.
     message: *UnknownException message: Error from server at 
http://localhost:8983/solr/localDocs17: Bad contentType for search 
handler :application/pdf request={wt=javabin&version=2}*



I hope you may be able to help me with this. I also posted this issue on 
Github 
<https://stackoverflow.com/questions/56149903/indexing-rich-documents-with-solrj-bad-contenttype-for-search-handler>.

Cheers,
Mareike Glock


Mime
View raw message