lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <>
Subject [jira] Created: (SOLR-2116) TikaEntityProcessor does not find parser by default
Date Thu, 09 Sep 2010 00:23:33 GMT
TikaEntityProcessor does not find parser by default

                 Key: SOLR-2116
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction)
    Affects Versions: 3.1, 4.0
            Reporter: Lance Norskog
         Attachments: pdflist-data-config.xml, pdflist.xml

The TikaEntityProcessor does not find the correct document parser by default.
This is in a two-level DIH config file. I have attached pdflist-data-config.xml and pdflist.xml,
the XML file list supplying. To test this, you will need the current 3.x branch or 4.0 trunk.
# Set up a Tika-enabled Solr 
# copy any PDF file to /tmp/testfile.pdf
# copy the pdflist-data-config.xml to your solr/conf
# and add this snippet to your solrconfig.xml
<requestHandler name="/pdflist"
  <lst name="defaults">
              <str name="config">pdflist-data-config.xml</str>

[http://localhost:8983/solr/pdflist?command=full-import] will make one document with the id
and text fields populated. If you remove this line:
from the TikaEntityProcessor entity, the parser will not be found and you will get a document
with the "id" field and nothing else.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message