lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <erik.hatc...@gmail.com>
Subject Re: Issues when indexing PDF files
Date Wed, 16 Dec 2015 16:15:12 GMT
Edwin - Can you share one of those PDF files?

Also, drop the file into the Tika app and see what it sees directly - get the tika-app JAR
and run that desktop application.

Could be an encoding issue?  

	Erik

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>



> On Dec 16, 2015, at 10:51 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:
> 
> Hi,
> 
> I'm using Solr 5.3.0
> 
> I'm indexing some PDF documents. However, for certain PDF files, there are
> chinese text in the documents, but after indexing, what is indexed in the
> content is either a series of "??????" or an empty content.
> 
> I'm using the post.jar that comes together with Solr.
> 
> What could be the reason that causes this?
> 
> Regards,
> Edwin


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message