lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?
Date Thu, 09 Jul 2015 16:34:29 GMT
Wow, that code looks familiar ;)...

Anyway, what have you tried?
bq: It would pull it but when I got the results in Solr it would look
blank

How do you know this? Do _some_ docs have text in Solr but some
don't or are all of your text fields blank? In this case I suspect
you're not storing the data.

What I'd do is isolate just the one file and look at the processing in
the debugger to see if any text is extracted. Then I'd look at the doc
in Word (or whatever) to insure that there _is_ text in it. Then.....

Perhaps the program is swallowing the error. Perhaps the file is
mal-formed and isn't being analyzed appropriately. Perhaps the
file isn't there at all.

And sending one doc to Solr at a time isn't very efficient, but perhaps
some of your files are so big that it's better that way.

Best,
Erick

On Thu, Jul 9, 2015 at 6:36 AM, Paden <rumsey.pr@gmail.com> wrote:
> I posted the code anyway just forgot to get rid of that line in the post.
> Sorry
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-Tika-custom-indexer-not-indexing-CERTAIN-doc-text-tp4216541p4216542.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message