pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From win harrington <win_harring...@yahoo.com.INVALID>
Subject extract bullet points from a PDF
Date Thu, 29 Sep 2016 13:08:51 GMT
I would like to extract all the lists of bullet points from a PDF fileand put them into an
xml format.
The items are indented. I wantthe text and the indentation level.
The input is like this:   
   - abc
   - def
   
   - xyz
   - ghi
   
   - 123
   - 456


Can I convert that to:abc def   xyz   ghi      123      456
The last step will be toadd tags. I have code to do this:
<abc></abc><def></def>    <xyz></xyz>    <ghi></ghi> 
      <123></123>
        <456></456>

Thank you. Win Harrington



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message