lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cloax <>
Subject ExtractRequestHandler - not properly indexing office docs?
Date Fri, 19 Jun 2009 23:20:14 GMT

Hi there, 

I've got a Solr instance running and am feeding it rich binary documents to
index from a Django application. The setup works just fine with pdf's, etc..
but no matter what type of MS Word document ( doc and docx ) I feed it I
can't get any results when searching for content-related queries.

I've curl'd with extract.only to verify that Solr ( and tika ) could extract
the contents, and it happily enough spits back the extracted XHTML to me.
That content never seems to find it's way into the ext.def.fl that I have

When I go and search for terms specific to content in those documents, I get
zero hits. However I get hits on metadata related queries ( ie: i store
username of who uploaded it, etc.. ) 

Is there some magical bit I forgot to flip?

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message