pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balaji Venkatamohan <bvenk...@tibco.com>
Subject Re: How to flatedecode and find all acroform fields in a compressed PDF
Date Wed, 27 May 2015 18:22:28 GMT
As I said, I'm not the acroform specialist here. So I can't tell if it is
possible to repair these, and if there aren't side effects that e.g. PDFs
with annotations end up being forms. Yes, we've done all sorts of things to
accomodate broken PDFs. But here the fault is known, it is a website that
deletes data from PDFs to "compress" them. The better solution would be to
have this guy fix his website, i.e. allow options to decide what is to be
removed, and what not. Another solution (which I mentioned before) would be
to have your customer compress his PDFs with the method I mentioned in this
thread, i.e. if this customer of yours generates PDFs, but doesn't have the
knowledge to compress the streams. He could of course look into our source
code (FlateFilter.java, it is just 10 lines)  and see how to compress
himself.

Alright!

On Wed, May 27, 2015 at 11:20 AM, Tilman Hausherr <THausherr@t-online.de>
wrote:

> Am 27.05.2015 um 20:02 schrieb Balaji Venkatamohan:
>
>> Thanks Tilman for letting the website developer know about the
>> shortcomings
>> of their compression technique.
>>
>> The PDF owner did not share with us the information about which website
>> they used for compressing the PDF. My teammates helped in identifying this
>> website. I will let the customer know about this particular website and
>> will leave it to them regarding continuing to use this website for their
>> PDF documents.
>>
>> Could you also answer the following question please?
>> Would Pdfbox API change its code to accommodate the incorrect condition
>> that annotation fields (editable fields) are outside acro form fields as
>> well? I know the PDF compressed by the website is incorrect and hence I
>> would understand if you don't go ahead with this.
>>
>
> As I said, I'm not the acroform specialist here. So I can't tell if it is
> possible to repair these, and if there aren't side effects that e.g. PDFs
> with annotations end up being forms. Yes, we've done all sorts of things to
> accomodate broken PDFs. But here the fault is known, it is a website that
> deletes data from PDFs to "compress" them. The better solution would be to
> have this guy fix his website, i.e. allow options to decide what is to be
> removed, and what not. Another solution (which I mentioned before) would be
> to have your customer compress his PDFs with the method I mentioned in this
> thread, i.e. if this customer of yours generates PDFs, but doesn't have the
> knowledge to compress the streams. He could of course look into our source
> code (FlateFilter.java, it is just 10 lines)  and see how to compress
> himself.
>
> Tilman
>
>
>
>> Thanks,
>> Balaji
>>
>>
>> On Tue, May 26, 2015 at 10:45 PM, Tilman Hausherr <THausherr@t-online.de>
>> wrote:
>>
>>  I just tested it. It also removes /Outlines and /Metadata and more
>>> important data from PDF files.
>>>
>>> So your client can't share the PDF with us, but he shared it some
>>> website.
>>>
>>> A little research shows that this website is owned by Lauri Lehtinen from
>>> Talinn, Estonia.
>>> http://www.checkdomain.com/cgi-bin/checkdomain.pl?domain=pdfcompress.com
>>> https://www.linkedin.com/in/laurilehtinen
>>> https://twitter.com/laurii
>>>
>>> I also tweeted him.
>>>
>>> Tilman
>>>
>>>
>>> Am 27.05.2015 um 03:06 schrieb Balaji Venkatamohan:
>>>
>>>  Okay, I found out the online tool used by the customer to compress their
>>>> PDF.
>>>>
>>>> It is : https://www.pdfcompress.com/
>>>>
>>>> I don't need to rely on the PDF sent by the customer because all PDFs
>>>> that
>>>> are available on the web, are compressed in the same manner by this
>>>> tool,
>>>> that is, it gets rid of all acro form fields during compression.
>>>>
>>>> For example, the f941 govt form available at this site:
>>>> http://www.irs.gov/pub/irs-pdf/f941.pdf
>>>> If we compress this using the online tool, the resultant file size is
>>>> very
>>>> low, which is good. However, there are no acro form fields in the
>>>> compressed PDF.
>>>>
>>>> Thanks,
>>>> Balaji
>>>>
>>>>
>>>>
>>>> On Sun, May 24, 2015 at 2:38 AM, Maruan Sahyoun <sahyoun@fileaffairs.de
>>>> >
>>>> wrote:
>>>>
>>>>   Hi,
>>>>
>>>>>   Am 23.05.2015 um 16:37 schrieb Balaji Venkatamohan <
>>>>> bvenkata@tibco.com
>>>>>
>>>>>> :
>>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> So AcroForms/Fields is an empty Array?
>>>>>>
>>>>>> Yes, in the filled interview_compressed.pdf, the acroforms are not
>>>>>> null
>>>>>>
>>>>>>  but
>>>>>
>>>>>  empty. Size of array is zero.
>>>>>>
>>>>>> Also, I tried qpdf command line tool to compress the file
>>>>>> interview.pdf
>>>>>>
>>>>>>  and
>>>>>
>>>>>  the resultant compressed file size of 1.6MB was no way near the file
>>>>>> size
>>>>>> of interview_compressed.pdf (21 KB).
>>>>>>
>>>>>>  would you think it's possible to get a similar PDF file or
>>>>> permission to
>>>>> use it internally so we have a sample to look at a potential fix.
>>>>>
>>>>> Although the PDF is not inline with the spec as Acrobat is able to
>>>>> handle
>>>>> it we could look into getting a similar result.
>>>>>
>>>>> BR
>>>>> Maruan
>>>>>
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>> Balaji
>>>>>>
>>>>>> On Fri, May 22, 2015 at 11:58 PM, Maruan Sahyoun <
>>>>>> sahyoun@fileaffairs.de
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>   Hi,
>>>>>>
>>>>>>>   Am 22.05.2015 um 23:00 schrieb Balaji Venkatamohan <
>>>>>>>
>>>>>>>> bvenkata@tibco.com
>>>>>>>>
>>>>>>>>  :
>>>>>>> I opened the interview_compressed in notepad++ and did not see
any
>>>>>>>
>>>>>>>> 'Acroform' text anywhere.
>>>>>>>> However, as Maruan suggested, I entered some data into what
looks
>>>>>>>> like
>>>>>>>>
>>>>>>>>  form
>>>>>>>
>>>>>>>  fields of interview_compressed.pdf and saved it. When I opened
this
>>>>>>>>
>>>>>>>>  file
>>>>>>>
>>>>>> in
>>>>>>
>>>>>>> notepad++, I did see 'Acroform' text in it. I also noticed an
>>>>>>>> increase
>>>>>>>>
>>>>>>>>  in
>>>>>>>
>>>>>> file size from 21 KB to ~530 KB.
>>>>>>
>>>>>>> I then ran this filled saved compressed PDF in pdfdebugger.java
and
>>>>>>>> saw
>>>>>>>> that the field values were getting stored but not under Acroform
>>>>>>>> fields
>>>>>>>>
>>>>>>>>  but
>>>>>>>
>>>>>>>  under Annotations.
>>>>>>>>
>>>>>>>>
>>>>>>> So AcroForms/Fields is an empty Array?
>>>>>>>
>>>>>>>   Please refer to this image:
>>>>>>>
>>>>>>>> http://imageshack.com/a/img540/9951/QGLDtS.jpg
>>>>>>>>
>>>>>>>> So, whatever the compression technique was, it simply made
all the
>>>>>>>>
>>>>>>>>  Acroform
>>>>>>>
>>>>>>>  fields disappear from the original PDF but retained all annotations
>>>>>>>>
>>>>>>>>  which
>>>>>>>
>>>>>> also contain the interactive forms and this helped reduce the file
>>>>>> size
>>>>>>
>>>>>>> so
>>>>>>>
>>>>>>>  much? If this is the case, can pdfbox API also use similar
>>>>>>>> compression
>>>>>>>> technique to compress such a a huge file into a smaller one?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 22, 2015 at 1:25 PM, Maruan Sahyoun <
>>>>>>>>
>>>>>>>>  sahyoun@fileaffairs.de>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>>   Hi,
>>>>>>>>
>>>>>>>>>   Am 22.05.2015 um 21:54 schrieb Tilman Hausherr <
>>>>>>>>> THausherr@t-online.de
>>>>>>>>>
>>>>>>>> :
>>>>>>
>>>>>>> Am 22.05.2015 um 17:53 schrieb Balaji Venkatamohan:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I used PdfDebugger to make the internal PDF structure
of the two
>>>>>>>>>>>
>>>>>>>>>>>  files
>>>>>>>>>>
>>>>>>>>> (1)
>>>>>>
>>>>>>> interview.pdf and (2) interview_compressed.pdf  visually available
>>>>>>>>>> and I
>>>>>>>>>>
>>>>>>>>> have uploaded my images to imageshack. Here are the four
links:
>>>>>>>>
>>>>>>>>> http://imageshack.com/a/img538/8277/JghCpG.jpg
>>>>>>>>>>> http://imageshack.com/a/img909/6140/KsYNGR.jpg
>>>>>>>>>>> http://imageshack.com/a/img903/8644/mk15As.jpg
>>>>>>>>>>> http://imageshack.com/a/img901/8610/NXe3mJ.jpg
>>>>>>>>>>> http://imageshack.com/a/img673/8633/0GMdjQ.jpg
>>>>>>>>>>>
>>>>>>>>>>> The first two links are from the internal structure
of
>>>>>>>>>>> interview.pdf
>>>>>>>>>>> (original uncompressed file)
>>>>>>>>>>> The third and fourth links are from the internal
structure of
>>>>>>>>>>> interview_compressed.pdf (compressed file)
>>>>>>>>>>> The fifth link compares the file sizes of the
two files and as
>>>>>>>>>>> you
>>>>>>>>>>>
>>>>>>>>>>>  can
>>>>>>>>>>
>>>>>>>>> also
>>>>>>
>>>>>>> see, the difference is huge.
>>>>>>>>>>
>>>>>>>>>>> As you might notice, the file interview_compressed.pdf
has no
>>>>>>>>>>>
>>>>>>>>>>>  acroform
>>>>>>>>>>
>>>>>>>>> Indeed... but this is needed - from the spec:
>>>>>>
>>>>>>> "The contents and properties of a document’s interactive form
shall
>>>>>>>>>>
>>>>>>>>>>  be
>>>>>>>>>
>>>>>>>> defined by an interactive form dictionary that shall be referenced
>>>>>>
>>>>>>> from
>>>>>>>>
>>>>>>> the
>>>>>>
>>>>>>> AcroForm entry in the document catalogue (see 7.7.2, “Document
>>>>>>>> Catalog”).
>>>>>>>> Table 218 shows the contents of this dictionary."
>>>>>>>>
>>>>>>>>> correct
>>>>>>>>>
>>>>>>>>>   fields listed even though opening the PDF in pdf reader
allows
>>>>>>>>> me to
>>>>>>>>>
>>>>>>>>>> enter
>>>>>>>>>> values in places which look like AcroForm fields
and also save
>>>>>>>>>> them.
>>>>>>>>>> Are
>>>>>>>>>>
>>>>>>>>> there any other PDF 'types' similar to Acroform fields
which would
>>>>>>>>
>>>>>>>>> enable
>>>>>>>>>> users to fill data and which can be accessed in PdfBox
APIs
>>>>>>>>>> without
>>>>>>>>>> having
>>>>>>>>>> to go through PDAcrofield?
>>>>>>>>>> Yes, annotations... there are some common parts,
but this is just
>>>>>>>>>> a
>>>>>>>>>>
>>>>>>>>>>  vague observation from me, I'm not the acroform
specialist.
>>>>>>>>>
>>>>>>>>> from a first glance it looks like there are all entries
necessary
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>  (re-)
>>>>>>>> generate the form fields. That's what's likely happening
for this
>>>>>>>> document
>>>>>>>> in Adobe Reader. Would be interesting to see what's being
save after
>>>>>>>> the
>>>>>>>>
>>>>>>> forms has been filled out and saved using Acrobat. We'd need
a test
>>>>>>
>>>>>>> form to
>>>>>>>> come up with an enhancement like this.
>>>>>>>>
>>>>>>>>> BR
>>>>>>>>> Maruan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   What you should do: use NOTEPAD++ to look whether there's
>>>>>>>>>
>>>>>>>>>> "/AcroForm"
>>>>>>>>>>
>>>>>>>>>>  in
>>>>>>>>>
>>>>>>>> the "compressed" file.
>>>>>>>>
>>>>>>>>> - if it is missing, tell the client (or your boss) just
that
>>>>>>>>>> - if it isn't missing, then there's some problem
in PDFBox (try
>>>>>>>>>> also
>>>>>>>>>>
>>>>>>>>>>  the
>>>>>>>>>
>>>>>>>> loadNonSeq I mentioned earlier)
>>>>>>>>
>>>>>>>>> Tilman
>>>>>>>>>>
>>>>>>>>>>   You can use qpdf , then use these options:
>>>>>>>>>>
>>>>>>>>>>> I will now try using this link to compress the
original file.
>>>>>>>>>>>
>>>>>>>>>>> Another strategy to think about - can your client
generate a
>>>>>>>>>>> non-confidential file, so that you can share
it, and the
>>>>>>>>>>>
>>>>>>>>>>>  "compressed"
>>>>>>>>>>
>>>>>>>>> file?
>>>>>>
>>>>>>> I wish I had direct communication with the clients but due to
>>>>>>>>>> bureaucracy,
>>>>>>>>>> I am having to go through multiple layers to get
my message across
>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> them.
>>>>>>
>>>>>>> I will share more information as soon as I have them.
>>>>>>>>>>
>>>>>>>>>>> PS: i sent these image links to my personal email
first to make
>>>>>>>>>>> sure
>>>>>>>>>>>
>>>>>>>>>>>  that I
>>>>>>>>>> can open them. I could and so I am hoping you all
could too. If
>>>>>>>>>> you
>>>>>>>>>> are
>>>>>>>>>>
>>>>>>>>> unable to open them, please let me know.
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>>> Balaji
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 22, 2015 at 6:45 AM, Tilman Hausherr
<
>>>>>>>>>>>
>>>>>>>>>>>  THausherr@t-online.de
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>   Am 22.05.2015 um 08:28 schrieb Andreas Lehmkühler:
>>>>>>>>>>>
>>>>>>>>>>>>   Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>> Balaji Venkatamohan <bvenkata@tibco.com>
hat am 20. Mai 2015
>>>>>>>>>>>>> um
>>>>>>>>>>>>>
>>>>>>>>>>>>>  03:24
>>>>>>>>>>>>
>>>>>>>>>>> geschrieben:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you for your pointers and sorry
about the image. I am
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  attaching it
>>>>>>>>>>>>>
>>>>>>>>>>>> with this email.
>>>>>>>>>>
>>>>>>>>>>> The point I am trying to make is that the PDF,
which was
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  decompressed
>>>>>>>>>>>>>
>>>>>>>>>>>> using
>>>>>>>>
>>>>>>>>> WriteDecodedDoc, is smaller in size than the original
PDF given
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  to
>>>>>>>>>>>>>
>>>>>>>>>>>> us by
>>>>>>
>>>>>>> our customers.
>>>>>>>>>>
>>>>>>>>>>> Also, the decompressed PDF generated by WriterDecodedDoc
of
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  PDFBox
>>>>>>>>>>>>>
>>>>>>>>>>>> did
>>>>>>
>>>>>>> not
>>>>>>>>>>
>>>>>>>>>>> have any PDAcroform fields whereas the decompressed
PDF given to
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  us
>>>>>>>>>>>>>
>>>>>>>>>>>> by
>>>>>>
>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>> customers does contain Acroform fields. Hence
I wanted to know
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  how
>>>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>
>>>>>>> properly decompress the PDF using pdfbox APIs. The reason why
I
>>>>>>>>
>>>>>>>>> was
>>>>>>>>>>>>>
>>>>>>>>>>>> analyzing COSStream was to check if the decompression
of the
>>>>>>
>>>>>>> compressed
>>>>>>>>>>>>>
>>>>>>>>>>>> PDF
>>>>>>>>>>
>>>>>>>>>>> was happening correctly while using PDFBox APIs.
>>>>>>>>>>>>>> I know it would have been difficult
for you to help me without
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  the
>>>>>>>>>>>>>
>>>>>>>>>>>> actual
>>>>>>
>>>>>>> PDFs. For that, I would like to thank you for your time and
>>>>>>>>>>
>>>>>>>>>>> pointers.
>>>>>>>>>>>>>
>>>>>>>>>>>> Maybe it's worth to try to share the file
"visually" with us.
>>>>>>>> Open
>>>>>>>>
>>>>>>>>> both
>>>>>>>>>>>>
>>>>>>>>>>> files
>>>>>>>>>>
>>>>>>>>>>> (compressed and decompressed) with PDFDebugger
[1] and post a
>>>>>>>>>>>>>
>>>>>>>>>>>>>  screenshot
>>>>>>>>>>>>
>>>>>>>>>>> of both
>>>>>>>>>>
>>>>>>>>>>> somehwere (dropbox etc.) and share the link with
us. Maybe that
>>>>>>>>>>>>>
>>>>>>>>>>>>>  could
>>>>>>>>>>>>
>>>>>>>>>>> shed some
>>>>>>>>
>>>>>>>>> light on your issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>>   @Balaji: here's an example on how such
a screenshot would
>>>>>>>>>>>>> look
>>>>>>>>>>>>>
>>>>>>>>>>>> like:
>>>>>>>>>>>
>>>>>>>>>> http://home.snafu.de/tilman/tmp/pdfdebugger-screenshot.png
>>>>>>
>>>>>>> Tilman
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>   BR
>>>>>>>>>>>>
>>>>>>>>>>>>> Andreas Lehmkühler
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] http://pdfbox.apache.org/1.8/commandline.html#pdfDebugger
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, May 19, 2015 at 2:57 PM, Tilman
Hausherr <
>>>>>>>>>>>>>
>>>>>>>>>>>>>  THausherr@t-online.de>
>>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  The image doesn't appear in the
mailing list.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is all very confusing...
/acroform is in the document
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  catalog.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I
>>>>>>>>
>>>>>>>>> don't see how the page content stream is related to it.
The best
>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>>>>
>>>>>>>>>>>>> that
>>>>>>>>
>>>>>>>>> you either go through the source code, or read the spec
and then
>>>>>>>>>>
>>>>>>>>>>> look at
>>>>>>>>>>>>>>
>>>>>>>>>>>>> the pdf.
>>>>>>>>>>
>>>>>>>>>>> To find out what's going on, you'd have to start
from that
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  /acroform
>>>>>>>>>>>>>>
>>>>>>>>>>>>> entry
>>>>>>>>
>>>>>>>>> and then compare the two files.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is really difficult to help
you without the files. The
>>>>>>>>>>>>>>> cause
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  could
>>>>>>>>>>>>>>
>>>>>>>>>>>>> be a
>>>>>>>>>>
>>>>>>>>>>> bug in pdfbox, or a malformed pdf...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Some more ideas:
>>>>>>>>>>>>>>> - use loadNonSeq(file, null)
instead of load(file)
>>>>>>>>>>>>>>> - try the unreleased 2.0 version,
that one has some
>>>>>>>>>>>>>>> improvements
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  in
>>>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>
>>>>>>>>> acroform stuff. Note that the API is different.
>>>>>>>>>>
>>>>>>>>>>> https://pdfbox.apache.org/download.cgi#scm
>>>>>>>>>>>>>>> https://pdfbox.apache.org/2.0/getting-started.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you still need help, one possibility
would be 1) post the
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  smallest
>>>>>>>>>>>>>>
>>>>>>>>>>>>> possible code that fails, and 2) post
a small part of the raw
>>>>>>>>>>
>>>>>>>>>>> PDF,
>>>>>>>>>>>>>>
>>>>>>>>>>>>> i.e.
>>>>>>
>>>>>>> the
>>>>>>>>>>
>>>>>>>>>>> objects relevant to the field in your code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tilman
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 19.05.2015 um 23:03 schrieb
Balaji Venkatamohan:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Moreover, for every page of the
compressed PDF (there are 3
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  pages), I
>>>>>>>>>>>>>>
>>>>>>>>>>>>> tried getting the COSStream for each
of the page :
>>>>>>>>>>
>>>>>>>>>>> PDPage firstPage=(PDPage)
>>>>>>>>>>>>>>>> document.getDocumentCatalog().getAllPages().get(0);
>>>>>>>>>>>>>>>>              pdStream=firstPage.getContents();
>>>>>>>>>>>>>>>>              COSStream stream=pdStream.getStream();
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In the above code snippet,
the object stream, when analyzed
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  debug
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> mode, has the following:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>> The line from the compressed
PDF as opened with Notepad++
>>>>>>>>>>>>>>>> is :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <</Filter/FlateDecode/Length
5675>>stream
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   From this point on, using
the COSStream object for every
>>>>>>>>>>>>>>>> page,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  how
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> can I
>>>>>>>>
>>>>>>>>> decompress and find out the acroform fields given that
the
>>>>>>>>>>>>>>>> unFilteredStream
>>>>>>>>>>>>>>>> object is null for COSStream?
>>>>>>>>>>>>>>>> ​
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, May 19, 2015 at 1:38
PM, Balaji Venkatamohan <
>>>>>>>>>>>>>>>> bvenkata@tibco.com
>>>>>>>>>>>>>>>> <mailto:bvenkata@tibco.com>>
wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      Thank you for your response
Tilman.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      I had previously tried
using the WriteDecodedDoc for my
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  compressed
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      PDF and I tried to get the number
of acro form fields
>>>>>>>>>>
>>>>>>>>>>> present
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in
>>>>>>
>>>>>>> the output file generated by WriteDecodedDoc. The API still
>>>>>>>>>>
>>>>>>>>>>> could
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      not find the acro form fields
in the generated
>>>>>>>> decompressed
>>>>>>>>
>>>>>>>>> file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>       Also the decompressed file
generated is 75 KB which is
>>>>>>>>>> far
>>>>>>>>>>
>>>>>>>>>>> less
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      than the original decompressed
file which I have (1.6 MB)
>>>>>>>>>>
>>>>>>>>>>> though I
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      could edit the acro form fields
using acrobat reader.
>>>>>>>>>>
>>>>>>>>>>>      Thanks,
>>>>>>>>>>>>>>>>      Balaji
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>      On Tue, May 19, 2015
at 1:18 PM, Tilman Hausherr
>>>>>>>>>>>>>>>>      <THausherr@t-online.de
<mailto:THausherr@t-online.de>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          Am 19.05.2015 um 21:35 schrieb
Balaji Venkatamohan:
>>>>>>>>
>>>>>>>>>              My question is: how do I flatedecode a PDF
so
>>>>>>>>>>>>>>>> that I
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  can
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>              find all the
>>>>>>>>>>
>>>>>>>>>>>              acroform fields within it. ANy help
or pointers
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  would
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> be
>>>>>>
>>>>>>>              highly appreciated.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>          You could try the
WriteDecodedDoc option of the
>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  line
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> app
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>> https://pdfbox.apache.org/1.8/commandline.html#writeDecodeDoc
>>>>>>>>>
>>>>>>>>>           Maybe you can have further ideas by comparing
the two
>>>>>>>>>>
>>>>>>>>>>> files
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          with NOTEPAD++.... however
the two files might have
>>>>>>>>>>
>>>>>>>>>>> their
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          objects in different order.
>>>>>>
>>>>>>>          Tilman
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>           To unsubscribe, e-mail:
>>>>>>>>>>
>>>>>>>>>>> users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          <mailto:users-unsubscribe@pdfbox.apache.org>
>>>>>>>>>>
>>>>>>>>>>>          For additional commands, e-mail:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  users-help@pdfbox.apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>          <mailto:users-help@pdfbox.apache.org>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>  To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>>
>>>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>>  To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>
>>>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>>  To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>
>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>>>>> <mailto:
>>>>>>>>>>
>>>>>>>>>>  users-unsubscribe@pdfbox.apache.org>
>>>>>>>>>
>>>>>>>>>  For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>>>>
>>>>>>>>>>  <mailto:
>>>>>>>>>
>>>>>>>> users-help@pdfbox.apache.org>
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>>>
>>>>>
>>>>>  ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message