pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francisco Andrés Fernández <fra...@gmail.com>
Subject Bad text extraction result
Date Wed, 24 Feb 2016 19:17:37 GMT
Hi all,
I'm extracting some text from pdf, through Tika in Solr. As result, some
important words end with spaces between characters.
For example, I could have the word "Subtitle" that I want to detect,
written like "S u b t i t l e".
How could I make PdfBox detect this type of word occurrence?
Many thanks,

Francisco

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message