pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Overby <gr...@floorsoft.com>
Subject Re: Save URLs to PDFs?
Date Fri, 05 Nov 2010 22:53:07 GMT
I ran the code [2]. The pdf is corrupted by the code as MD5s are different.
File sizes are identical [1];

1:
11/05/2010  06:47 PM         2,371,050 msb201055.pdf
11/05/2010  06:46 PM         2,371,050 My.pdf



2:
package s;

import java.io.FileWriter;
import java.io.InputStream;
import java.io.IOException;
import java.net.URL;
import java.net.URLConnection;
import java.net.MalformedURLException;

public class Main
{
  public static void main(String[] args) throws IOException
  {
    URL url = new URL("
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
");

    URLConnection con = url.openConnection();

    InputStream in = con.getInputStream();

    FileWriter out = new FileWriter("C:/My.pdf");

    int next = 0;
    while ( ( next = in.read() ) != -1  ) {
      out.write(next);
    }
    out.flush();
    out.close();
    in.close();
  }
}




--
Grant Overby
Senior Developer
FloorSoft, Inc.

Often people, especially computer engineers, focus on the machines. They
think, "By doing this, the machine will run faster. By doing this, the
machine will run more effectively. By doing this, the machine will something
something something." They are focusing on machines. But in fact we need to
focus on humans, on how humans care about doing programming or operating the
application of the machines. We are the masters. They are the slaves. --
Yukihiro Matsumoto




On Fri, Nov 5, 2010 at 6:45 PM, <Adam@swmc.com> wrote:

> Yogesh,
>
> Compare the file size and hash (SHA1, MD5, etc.) of the file you download
> from your browser with the file that Java downloads.  The end of the file
> may be missing when you download it via Java.  I know you said the file
> size is correct, but is it the *exact* same number of bytes?  If so, then
> the content must be different, and it should just be a matter of running
> `diff` on the files to see what's going wrong.
>
> ----
> Thanks,
> Adam
>
>
>
>
>
> From:
> Yogesh <yogeshp08@gmail.com>
> To:
> grant@floorsoft.com
> Cc:
> users@pdfbox.apache.org
> Date:
> 11/05/2010 15:29
> Subject:
> Re: Save URLs to PDFs?
>
>
>
> Yes. I can download the file through the browser. It works perfectly fine.
>
> - Yogesh
>
>
>
> On 5 November 2010 18:25, Grant Overby <grant@floorsoft.com> wrote:
>
> > If you download the file through a browser? Does it work then?
> >
> >
> > --
> > Grant Overby
> > Senior Developer
> > FloorSoft, Inc.
> >
> > Often people, especially computer engineers, focus on the machines. They
> > think, "By doing this, the machine will run faster. By doing this, the
> > machine will run more effectively. By doing this, the machine will
> something
> > something something." They are focusing on machines. But in fact we need
> to
> > focus on humans, on how humans care about doing programming or operating
> the
> > application of the machines. We are the masters. They are the slaves. --
> > Yukihiro Matsumoto
> >
> >
> >
> >
> > On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yogeshp08@gmail.com> wrote:
> >
> >> I tried with that, it writes a blank PDF. Though, the file size and the
> >> number of pages is correct (for the new written file)
> >>
> >> - Yogesh
> >>
> >>
> >>
> >>
> >> On 5 November 2010 18:09, Grant Overby <grant@floorsoft.com> wrote:
> >>
> >>> You don't need pdfBox to do this. Below is some rough code that allows
> >>> you
> >>> to download a file and save it.
> >>>
> >>> URLConnection urlConnection = new URL("http://...");
> >>> InputStream   in      = urlConnection.getInputStream();
> >>> FileWriter out = new FileWriter("my.pdf");
> >>> int next = 0;
> >>> while ( ( next = in.read() ) != -1  ) out.write(next);
> >>> //close everything
> >>>
> >>> --
> >>> Grant Overby
> >>> Senior Developer
> >>> FloorSoft, Inc.
> >>>
> >>> Often people, especially computer engineers, focus on the machines.
> They
> >>> think, "By doing this, the machine will run faster. By doing this, the
> >>> machine will run more effectively. By doing this, the machine will
> >>> something
> >>> something something." They are focusing on machines. But in fact we
> need
> >>> to
> >>> focus on humans, on how humans care about doing programming or
> operating
> >>> the
> >>> application of the machines. We are the masters. They are the slaves.
> --
> >>> Yukihiro Matsumoto
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yogeshp08@gmail.com> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > I have PDFs which I can access through URLs. I want to download and
> >>> save it
> >>> > to files. How can I go about it?
> >>> >
> >>> > Thanks
> >>> >
> >>> > -Yogesh
> >>> >
> >>>
> >>
> >>
> >
>
>
>
> - FHA 203b; 203k; HECM; VA; USDA; Conventional
> - Warehouse Lines; FHA-Authorized Originators
> - Lending and Servicing in over 45 States
> www.swmc.com   -  www.simplehecmcalculator.com   Visit
> www.swmc.com/resources   for helpful links on Training, Webinars, Lender
> Alerts and Submitting Conditions
> This email and any content within or attached hereto from Sun West Mortgage
> Company, Inc. is confidential and/or legally privileged. The information is
> intended only for the use of the individual or entity named on this email..
> If you are not the intended recipient, you are hereby notified that any
> disclosure, copying, distribution or taking any action in reliance on the
> contents of this email information is strictly prohibited, and that the
> documents should be returned to this office immediately by email. Receipt by
> anyone other than the intended recipient is not a waiver of any privilege.
> Please do not include your social security number, account number, or any
> other personal or financial information in the content of the email. Should
> you have any questions, please call (800) 453 7884.  =
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message