pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tilman Hausherr (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-4215) Get pages from a HTTP stream of a large pdf file
Date Wed, 09 May 2018 16:46:00 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469065#comment-16469065

Tilman Hausherr commented on PDFBOX-4215:

If you don't have enough memory and can't use the disc for a scratch file, then you'll be
limited. "Parse on demand" may be coming in the future, but we don't know when. You might
try https://github.com/torakiki/sambox this is a fork of PDFBox.

> Get pages from a HTTP stream of a large pdf file
> ------------------------------------------------
>                 Key: PDFBOX-4215
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4215
>             Project: PDFBox
>          Issue Type: Wish
>          Components: Parsing
>    Affects Versions: 2.0.9
>            Reporter: Alexandre
>            Priority: Minor
> Hi Apache contributors,
> Suppose I have a very big pdf file and I want to split this file into file chunks (e.g.
one file per page). I cannot load the entire file into memory and I cannot use the hard
disk of the computer as described in the doc for large files... :D. But I still have the
stream of the file, line by line.(on)
> I read that it is not feasible to get the pages of the pdf in order (because of the
pdf specs), but is it feasible to load random pages if you read line by line and look for
page breaks in pdfbox?
> Hagd, A.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

View raw message