pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tilman Hausherr <THaush...@t-online.de>
Subject Re: extract bullet points from a PDF
Date Thu, 29 Sep 2016 19:17:38 GMT
Am 29.09.2016 um 21:11 schrieb Harrington, Ferdinand B:
> I found PDFText2HTML.java. Is there an example of how to call it?

Yes, see TestPDFText2HTML.java

I doubt that it can do indents.

Tilman

> Outlook distorted my message. The data is indented like this
> As bullets:
>
> Abc
> Def
>       Xyz
>       Ghi
>            123
>            456
>
> Thank you.
>
> -----Original Message-----
> From: Tilman Hausherr [mailto:THausherr@t-online.de]
> Sent: Thursday, September 29, 2016 2:44 PM
> To: users@pdfbox.apache.org
> Subject: Re: extract bullet points from a PDF
>
> Am 29.09.2016 um 15:08 schrieb win harrington:
>> I would like to extract all the lists of bullet points from a PDF fileand put them
into an xml format.
>> The items are indented. I wantthe text and the indentation level.
>> The input is like this:
>>      - abc
>>      - def
>>
>>      - xyz
>>      - ghi
>>
>>      - 123
>>      - 456
>>
>>
>> Can I convert that to:abc def   xyz   ghi      123      456
>> The last step will be toadd tags. I have code to do this:
>> <abc></abc><def></def>    <xyz></xyz>    <ghi></ghi>
       <123></123>
>>           <456></456>
> This sounds like an ordinary java question, i.e. parse some text. PDFBox
> does have some rudimentary paragraph detection, I don't know if it
> works. Try the PDFText2HTML tool in the source download.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ________________________________
>
> This e-mail and any attachments are intended only for the use of the addressee(s) named
herein and may contain proprietary information. If you are not the intended recipient of this
e-mail or believe that you received this email in error, please take immediate action to notify
the sender of the apparent error by reply e-mail; permanently delete the e-mail and any attachments
from your computer; and do not disseminate, distribute, use, or copy this message and any
attachments.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Mime
View raw message