I have a PDF file which I am trying to extract text from. Unfortunately the document is non sequential and has various boxes with supplementary content. When I open the file in Acrobat Reader, Reader seems to be able to distinguish these features and can surround them with a blue bounding box. I would like to be able to extract text by area from within these bounding boxes? Is PDFBox capable of detecting these features also?
I have attached a screenshot showing the style of box I am referring to (top right hand corner)