lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paden <rumsey...@gmail.com>
Subject TikaEntityProcessor Not Finding My Files
Date Tue, 16 Jun 2015 16:04:52 GMT
Hi, there's a guy who's already asked a question similar to this and I'm
basically going off what he did here. It's exactly what I'm doing which is
taking a file path from a database and using TikaEntityProcessor to analyze
the document. The link to his question is here. 

http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html#a3524905
<http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html#a3524905>
 

His problem was version issues with Tika but I'm using a version that is
about five years older so I'm not sure if it's still issues with the current
version of Tika or if I'm missing something extremely obvious (which is
possible I'm extremely new to Solr) This is my data configuration.
TextContentURL is the filepath!

<dataConfig> 
  <dataSource name="ds-db" type="JdbcDataSource"
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/EDMS_Metadata" user="root"
password="**************" /> 
  <dataSource name="ds-file" type="BinFileDataSource"/> 

 <document name="doc1"> 
	<entity name="db-data" dataSource="ds-db"  query="select TextContentURL as
'id',ID,Title,AuthorCreator from MasterIndex" > 
	<field column="TextContentURL" name="id" /> 
	<field column="Title" name="title" /> 
	</entity> 
	<entity name="file" dataSource="ds-file" processor="TikaEntityProcessor"
url="${db-data.TextContentURL}" format="text">
	 <field column="text" name="text" />    
    </entity> 
  </document> 
</dataConfig> 

I'd like to note that when I delete the second entity and just run the
database draw it works fine. I can run and query and I get this output when
I run a faceted search

 "response": {
    "numFound": 283,
    "start": 0,
    "docs": [
      {
        "id": "/home/paden/Documents/LWP_Files/BIGDATA/6220106.pdf",
        "title": "ENGINEERING INITIATION",
      },

This means that it is pulling the document filepath JUST FINE. The id is the
correct filepath. But when I re-add the second entity it logs errors saying
it can't find the file? Am I missing something obvious? 



--
View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-Not-Finding-My-Files-tp4212241.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message