From dev-return-62973-archive-asf-public=cust-asf.ponee.io@pdfbox.apache.org Mon May 20 20:41:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 7640318077A for ; Mon, 20 May 2019 22:41:02 +0200 (CEST) Received: (qmail 51273 invoked by uid 500); 20 May 2019 20:41:01 -0000 Mailing-List: contact dev-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pdfbox.apache.org Delivered-To: mailing list dev@pdfbox.apache.org Received: (qmail 51082 invoked by uid 99); 20 May 2019 20:41:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 May 2019 20:41:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 72D24E2B51 for ; Mon, 20 May 2019 20:41:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3001825815 for ; Mon, 20 May 2019 20:41:00 +0000 (UTC) Date: Mon, 20 May 2019 20:41:00 +0000 (UTC) From: "Tilman Hausherr (JIRA)" To: dev@pdfbox.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PDFBOX-4542) Suggestion: Don't load large streams completely into memory, reference them instead MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PDFBOX-4542?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D168= 44276#comment-16844276 ]=20 Tilman Hausherr commented on PDFBOX-4542: ----------------------------------------- See PDFBOX-4453 and PDFBOX-4477 and in SecurityHandler look for {{objects}}= and related comments. > Suggestion: Don't load large streams completely into memory, reference th= em instead > -------------------------------------------------------------------------= ---------- > > Key: PDFBOX-4542 > URL: https://issues.apache.org/jira/browse/PDFBOX-4542 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, PDModel > Affects Versions: 2.0.14 > Reporter: Jonathan > Priority: Minor > Labels: Memory, memory, performance > > As we processed large PDF files, many of which containing large image str= eams, we wanted to avoid loading the entire streams into memory. Instead, w= e implemented a mechanism that merely referenced their location on disk. > We eventually did this by subclassing COSStream, and then overriding COSP= arser.parseCOSStream(COSDictionary) to conditionally create our stream. Her= e is the code, this is currently still a work-in-progress. I've just refact= ored the entire mechanism. > {code:java} > public class ReferencedCOSStream > =C2=A0=C2=A0 extends COSStream > { > =C2=A0=C2=A0 //~ Instance members ---------------------------------------= ---------------------------------------------------------------------------= ------------ > =C2=A0=C2=A0 boolean isReference =3D false; > =C2=A0=C2=A0 File=C2=A0=C2=A0=C2=A0 reference=C2=A0=C2=A0 =3D null; > =C2=A0=C2=A0 long=C2=A0=C2=A0=C2=A0 offset=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =3D -1; > =C2=A0=C2=A0 long=C2=A0=C2=A0=C2=A0 length=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =3D -1; > =C2=A0=C2=A0 //~ Constructors -------------------------------------------= ---------------------------------------------------------------------------= ------------ > =C2=A0=C2=A0 private ReferencedCOSStream(final ScratchFile scratchFile) > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 super(scratchFile); > =C2=A0=C2=A0 } > =C2=A0=C2=A0 //~ Methods ------------------------------------------------= ---------------------------------------------------------------------------= ------------ > =C2=A0=C2=A0 public static ReferencedCOSStream createFromCOSStream(final = COSStream stream) > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final ReferencedCOSStream out =3D new Refe= rencedCOSStream(stream.getScratchFile()); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for (final Map.Entry ent= ry : stream.entrySet()) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 out.setItem(entry.getKey= (), entry.getValue()); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return out; > =C2=A0=C2=A0 } > =C2=A0=C2=A0 @Override > =C2=A0=C2=A0 public COSInputStream createInputStream(final DecodeOptions = options) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (this.isReference) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final InputStream in =3D= new SlicedFileInputStream(this.reference, this.offset, this.length); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return COSInputStream.cr= eate(getFilterList(), this, in, this.getScratchFile(), options); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return super.createInput= Stream(options); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0 } > =C2=A0=C2=A0 @Override > =C2=A0=C2=A0 public InputStream createRawInputStream() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (this.isReference) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return new SlicedFileInp= utStream(this.reference, this.offset, this.length); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return super.createRawIn= putStream(); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0 } > =C2=A0=C2=A0 @Override > =C2=A0=C2=A0 public OutputStream createOutputStream(final COSBase filters= ) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.isReference =3D false; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return super.createOutputStream(filters); > =C2=A0=C2=A0 } > =C2=A0=C2=A0 @Override > =C2=A0=C2=A0 public OutputStream createRawOutputStream() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.isReference =3D false; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return super.createRawOutputStream(); > =C2=A0=C2=A0 } > =C2=A0=C2=A0 public void setReference(final File file, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 final long offset, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 final long length) > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.isReference =3D true; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.reference=C2=A0=C2=A0 =3D file; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.offset=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =3D offset; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.length=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =3D length; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.setLong(COSName.LENGTH, length); > =C2=A0=C2=A0 } > =C2=A0=C2=A0 //~ Inner Classes ------------------------------------------= ---------------------------------------------------------------------------= ------------ > =C2=A0=C2=A0 private class SlicedFileInputStream > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 extends FileInputStream > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 //~ Instance members ---------------------= ---------------------------------------------------------------------------= --------------------------- > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 private long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 index; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 private final long length; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 //~ Constructors -------------------------= ---------------------------------------------------------------------------= --------------------------- > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 public SlicedFileInputStream(final File fi= le, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final long offset= , > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final long length= ) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws FileNotFoundExcep= tion, IOException > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 super(file); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.length =3D length; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.skip(offset); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this.index =3D 0; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 //~ Methods ------------------------------= ---------------------------------------------------------------------------= --------------------------- > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 @Override > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 public int available() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final long remaining =3D= length - index; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (remaining < 0) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return= 0; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return (int)remaining; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 @Override > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 public int read(final byte[] b) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final int remaining =3D = this.available(); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final int len=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D (remaining < b.length) ? remaining : b.leng= th; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 index +=3D len; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (len > 0) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return= super.read(b, 0, len); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return= -1; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 @Override > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 public int read(final byte[] b, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final int=C2=A0=C2= =A0=C2=A0 off, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 len) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final int remaining =3D = this.available(); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 len=C2=A0=C2=A0 =3D=C2= =A0 (remaining < len) ? remaining : len; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 index +=3D len; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (len > 0) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return= super.read(b, 0, len); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return= -1; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 @Override > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 public long skip(final long n) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 index +=3D n; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return super.skip(n); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 @Override > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 public FileChannel getChannel() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throw new UnsupportedOpe= rationException("Obtaining a FileChannel is not supported because a correct= offset cannot be ensured."); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0 } > } > {code} > {code:java} > =C2=A0=C2=A0 @Override > =C2=A0=C2=A0 protected COSStream parseCOSStream(final COSDictionary dic) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throws IOException > =C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * This needs to be dic.getItem becau= se when we are parsing, the underlying object might still be null. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final COSNumber streamLengthObj =3D getLen= gth(dic.getItem(COSName.LENGTH), dic.getCOSName(COSName.TYPE)); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 COSStream=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 stream=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D docume= nt.createCOSStream(dic); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // read 'stream'; this was already tested = in parseObjectsDynamically() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 readString(); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 skipWhiteSpaces(); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (streamLengthObj =3D=3D null) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (isLenient) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 LOG.wa= rn("The stream doesn't provide any stream length, using fallback readUntilE= nd, at offset " + source.getPosition()); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throw = new IOException("Missing length for stream."); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if ((streamLengthObj !=3D null) && (stream= LengthObj.longValue() >=3D 1024)) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final long=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 streamBegPos =3D source.getPosition(); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final ReferencedCOSStrea= m refStream=C2=A0=C2=A0=C2=A0 =3D ReferencedCOSStream.createFromCOSStream(s= tream); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 try > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 readVa= lidStream(null, streamLengthObj); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 finally > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 stream= .setItem(COSName.LENGTH, streamLengthObj); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 refStream.setReference(n= ew File(reference), streamBegPos, source.getPosition() - streamBegPos); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 stream =3D refStream; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 try(final OutputStream o= ut =3D stream.createRawOutputStream()) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if ((s= treamLengthObj !=3D null) && validateStreamLength(streamLengthObj.longValue= ())) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 readValidStream(out, streamLengthObj); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 readUntilEndStream(new EndstreamOutputStream(out)); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 finally > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 stream= .setItem(COSName.LENGTH, streamLengthObj); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 final String endStream =3D readString(); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (endStream.equals("endobj") && isLenien= t) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 LOG.warn("stream ends wi= th 'endobj' instead of 'endstream' at offset " + source.getPosition()); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // avoid follow-up warni= ng about missing endobj > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 source.rewind(ENDOBJ.len= gth); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else if ((endStream.length() > 9) && isLen= ient && endStream.substring(0, 9).equals(ENDSTREAM_STRING)) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 LOG.warn("stream ends wi= th '" + endStream + "' instead of 'endstream' at offset " + source.getPosit= ion()); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // unread the "extra" by= tes > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 source.rewind(endStream.= substring(9).getBytes(ISO_8859_1).length); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else if (!endStream.equals(ENDSTREAM_STRIN= G)) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 throw new IOException("E= rror reading stream, expected=3D'endstream' actual=3D'" + endStream + "' at= offset " + source.getPosition()); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return stream; > =C2=A0=C2=A0 } > {code} > The class ReferencedCOSStream exposes the underlying data in exactly the = same way as it does COSStream, but instead of keeping the storage in memory= , it always opens a FileInputStream to retrieve the content. SlicedFileInpu= tStream basically wraps around a FileInputStream and tries to imitate the b= ehaviour of an InputStream for this specific chunk of data. > I needed to expose some APIs for these classes, the method ReferencedCOSS= tream.createFromCOSStream(COSStream) would better be located in PDDocument = and create the stream directly, I just didn't want to also modify PDDocumen= t. > Right now, encrypted streams are currently loaded into memory by the Secu= rityHandler directly after creation. If you want to accept this proposal, i= t might make sense to move the decryption handling also into COSStream and = ReferencedCOSStream and perform it upon request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org For additional commands, e-mail: dev-help@pdfbox.apache.org