lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "步青云" <>
Subject About indexing embed file with solr
Date Thu, 18 Jun 2015 02:17:14 GMT
      Could anyone recieve my email? I'm new to solr and I have some questions, could anyone
help me to give me some answer?
      I index file directly by extracting the content of file using Tika embeded in solr.
There is no problem of normal files. While I index a word embeded an another file, such as
a pdf file embed in a word, I couldn't get the content of embeded file. For example, I have
a word(doc) and there is a pdf embeded in the word(doc), I couldn't index the content of the
pdf file. While using the same jar of Tika to extract the content of embed file, I can get
the content of embeded file.
      I know Tika could extract the embed file since version 1.3. And the version of my solr
is 4.9.1, Tika used in this version of solr is 1.5. I don't know why I can't get the content
of embed file.
      Could anyone help me? Thank you very much.
                                                                                 Ping Liu
                                                                               18 June. 2015
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message