pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Wilson <williamstonconsult...@gmail.com>
Subject Re: Bug-Fix for Scratch File Bug
Date Mon, 29 Aug 2011 19:31:23 GMT
Hmm ... yeah, I'm afraid you do need an account.

On Mon, Aug 29, 2011 at 5:24 PM, Stefan Mücke <s.muecke@devsup.de> wrote:

> > License considerations severely limit what we can do with patches
> > provided via the mailing list.
> >
> > Would you please create an issue at
> > https://issues.apache.org/jira/browse/PDFBOX ?  When you attach your
> > patch files, please check the box that grants us the right to use the
> > files.
>
> Okay, but I have trouble finding an "Add/New/Create/Report issue" button.
> Do I need to have an account? I don't really want to create one.
>
> Stefan
>
>
> > May sound silly ... but ... it keeps everything legal!
> >
> > Thanks!
> >
> > Daniel
> >
> > On Sat, Aug 27, 2011 at 5:04 PM, Stefan Mücke <s.muecke@devsup.de>
> wrote:
> >
> > > Hi PDFBox comitters,
> > >
> > > I would like to contribute a bug fix for a long-standing, major problem
> > > in PDFBox.
> > >
> > > PDFBox uses a scratch file to reduce memory consumption. However, there
> > > is no mechanism that prevents two PDStreams from writing to the
> > > scratch file at the same time. When this happens, the resulting PDF
> > > contains garbage in some streams. This problem occurred to me several
> > > times (e.g. when writing to an image stream while constructing a
> > >page).
> > > Reproducing the bug
> > > *******************
> > >
> > > One can easily reproduce the bug. Open file AddImageToPDF.java and move
> > > the following line:
> > >
> > >    PDPageContentStream contentStream =
> > >        new PDPageContentStream(doc, page, true, true);
> > >
> > > immediately after the line in which the PDPage object is fetched:
> > >
> > >    PDPage page =
> > >        (PDPage)doc.getDocumentCatalog().getAllPages().get( 0 );
> > >
> > > With this modification, one will still get a PDF file, but Acrobat
> > > Reader will report that the image could not be processed. BTW, the
> > > files AddImageToPDF.java and ImageToPDF.java are almost identical. One
> > > of them should be deleted.
> > >
> > > Bug-Fix
> > > *******
> > >
> > > The problem can be solved by using a scratch file that is divided into
> > > pages (e.g. of 4 KB). Each PDStream in the scratch file is then
> > > associated with a list of pages. This list grows as more data is
> > >written to the stream.
> > > The bug fix requires minimal changes to the existing code. The very
> > > nice RandomAccess interface made this very easy.
> > >
> > > Here is what needs to be changed:
> > >
> > >    - Add the attached "PagedMultiRandomAccessFile.java" to the I/O
> > >    package - Change COSDocument.getScratchFile() to return a
> > >      RandomAccess instance provided by PagedMultiRandomAccessFile:
> > >
> > >        private PagedMultiRandomAccessFile scratchFile = null;
> > >
> > >        [...]
> > >
> > >        public COSDocument(File scratchDir) throws IOException {
> > >                tmpFile = File.createTempFile("pdfbox", "tmp",
> > >                scratchDir); scratchFile = new
> > >                        PagedMultiRandomAccessFile( new
> > >        RandomAccessFile(tmpFile, "rw")); }
> > >
> > >        public COSDocument(RandomAccess file) {
> > >                // scratchFile = file;
> > >                throw new RuntimeException("Not yet implemented.");
> > > //$NON-NLS-1$
> > >        }
> > >
> > >        [...]
> > >
> > >        /**
> > >         * Returns a new scratch file.
> > >         *
> > >         * @return the newly created scratch file
> > >         */
> > >        public RandomAccess getScratchFile() {
> > >                return scratchFile.getNewRandomAcess();
> > >        }
> > >
> > > One of the COSDocument constructors takes a RandomAccess file. This
> > > constructor is only called in a single location, namely, in method
> > > PDFParser.parse(). I am not sure if the RandomAccess parameter provided
> > > here is really a scratch file. Someone will have to decide what to do
> > > with this one.
> > >
> > > The code has been throughly tested and has been used in the production
> > > of several books without any problems.
> > >
> > > In the attachment please find the code. There is also a JUnit test that
> > > was used to debug my code. I have added an Apache license header and
> > > adopted PDFBox's code style. Feel free to make any desired changes.
> > >
> > > Best regards,
> > >
> > > Stefan Mücke
> > >
> > >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message