pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "robyp7 ." <rob...@gmail.com>
Subject user a filter in a PDFStripper parsing
Date Wed, 07 Oct 2015 12:13:35 GMT
hi

i would ask to you a question about PDFTextStripper:

I need to extract only some keyword/text patterns during the parsing of
every pdf line ON EACH PAGE (NOT ALL PDF PAGES)


for eg.

pdf like:
ABC 123
xyg 4
zz 2

I only need to obtain a string text

ABC 123
zzz 2

and i need also to get the page position of every text extracted

So i suppose to use a filter parsing

public class myFilter {

public accept( String text){
..
}
}

during the pdf parsing (line by line), pdfBox  call method accept

Isn't there something like an Estenxion (aka specialization/implementation)
that do this, and how add for PDFBox?

Im checking the source code but i cant find it.. I check that method
writeText return all pages and not each one..

If there isnt a solution i have to make filter parsing on entire text
string and use tag page

Page n 1
ABC 123
xyg 4
zz 1

..
..

Page n 2
ABC 456
xyhk
zz 2

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message