pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogesh <yogesh...@gmail.com>
Subject Re: Save URLs to PDFs?
Date Fri, 05 Nov 2010 22:58:07 GMT
Thanks Grant.
But I have thousands of PDF URLs like this. I have tried around 12 so far.
Can all of them be corrupt?

What can I do about this?


- Yogesh




On 5 November 2010 18:53, Grant Overby <grant@floorsoft.com> wrote:

> I ran the code [2]. The pdf is corrupted by the code as MD5s are different.
> File sizes are identical [1];
>
> 1:
> 11/05/2010  06:47 PM         2,371,050 msb201055.pdf
> 11/05/2010  06:46 PM         2,371,050 My.pdf
>
>
>
> 2:
> package s;
>
> import java.io.FileWriter;
> import java.io.InputStream;
> import java.io.IOException;
> import java.net.URL;
> import java.net.URLConnection;
> import java.net.MalformedURLException;
>
> public class Main
> {
>  public static void main(String[] args) throws IOException
>   {
>    URL url = new URL("
>
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
> ");
>
>     URLConnection con = url.openConnection();
>
>    InputStream in = con.getInputStream();
>
>    FileWriter out = new FileWriter("C:/My.pdf");
>
>    int next = 0;
>    while ( ( next = in.read() ) != -1  ) {
>      out.write(next);
>    }
>     out.flush();
>    out.close();
>    in.close();
>   }
> }
>
>
>
>
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
>
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will
> something
> something something." They are focusing on machines. But in fact we need to
> focus on humans, on how humans care about doing programming or operating
> the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
>
>
>
>
> On Fri, Nov 5, 2010 at 6:45 PM, <Adam@swmc.com> wrote:
>
> > Yogesh,
> >
> > Compare the file size and hash (SHA1, MD5, etc.) of the file you download
> > from your browser with the file that Java downloads.  The end of the file
> > may be missing when you download it via Java.  I know you said the file
> > size is correct, but is it the *exact* same number of bytes?  If so, then
> > the content must be different, and it should just be a matter of running
> > `diff` on the files to see what's going wrong.
> >
> > ----
> > Thanks,
> > Adam
> >
> >
> >
> >
> >
> > From:
> > Yogesh <yogeshp08@gmail.com>
> > To:
> > grant@floorsoft.com
> > Cc:
> > users@pdfbox.apache.org
> > Date:
> > 11/05/2010 15:29
> > Subject:
> > Re: Save URLs to PDFs?
> >
> >
> >
> > Yes. I can download the file through the browser. It works perfectly
> fine.
> >
> > - Yogesh
> >
> >
> >
> > On 5 November 2010 18:25, Grant Overby <grant@floorsoft.com> wrote:
> >
> > > If you download the file through a browser? Does it work then?
> > >
> > >
> > > --
> > > Grant Overby
> > > Senior Developer
> > > FloorSoft, Inc.
> > >
> > > Often people, especially computer engineers, focus on the machines.
> They
> > > think, "By doing this, the machine will run faster. By doing this, the
> > > machine will run more effectively. By doing this, the machine will
> > something
> > > something something." They are focusing on machines. But in fact we
> need
> > to
> > > focus on humans, on how humans care about doing programming or
> operating
> > the
> > > application of the machines. We are the masters. They are the slaves.
> --
> > > Yukihiro Matsumoto
> > >
> > >
> > >
> > >
> > > On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yogeshp08@gmail.com> wrote:
> > >
> > >> I tried with that, it writes a blank PDF. Though, the file size and
> the
> > >> number of pages is correct (for the new written file)
> > >>
> > >> - Yogesh
> > >>
> > >>
> > >>
> > >>
> > >> On 5 November 2010 18:09, Grant Overby <grant@floorsoft.com> wrote:
> > >>
> > >>> You don't need pdfBox to do this. Below is some rough code that
> allows
> > >>> you
> > >>> to download a file and save it.
> > >>>
> > >>> URLConnection urlConnection = new URL("http://...");
> > >>> InputStream   in      = urlConnection.getInputStream();
> > >>> FileWriter out = new FileWriter("my.pdf");
> > >>> int next = 0;
> > >>> while ( ( next = in.read() ) != -1  ) out.write(next);
> > >>> //close everything
> > >>>
> > >>> --
> > >>> Grant Overby
> > >>> Senior Developer
> > >>> FloorSoft, Inc.
> > >>>
> > >>> Often people, especially computer engineers, focus on the machines.
> > They
> > >>> think, "By doing this, the machine will run faster. By doing this,
> the
> > >>> machine will run more effectively. By doing this, the machine will
> > >>> something
> > >>> something something." They are focusing on machines. But in fact we
> > need
> > >>> to
> > >>> focus on humans, on how humans care about doing programming or
> > operating
> > >>> the
> > >>> application of the machines. We are the masters. They are the slaves.
> > --
> > >>> Yukihiro Matsumoto
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yogeshp08@gmail.com>
wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> > I have PDFs which I can access through URLs. I want to download
and
> > >>> save it
> > >>> > to files. How can I go about it?
> > >>> >
> > >>> > Thanks
> > >>> >
> > >>> > -Yogesh
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
> >
> >
> > - FHA 203b; 203k; HECM; VA; USDA; Conventional
> > - Warehouse Lines; FHA-Authorized Originators
> > - Lending and Servicing in over 45 States
> > www.swmc.com   -  www.simplehecmcalculator.com   Visit
> > www.swmc.com/resources   for helpful links on Training, Webinars, Lender
> > Alerts and Submitting Conditions
> > This email and any content within or attached hereto from Sun West
> Mortgage
> > Company, Inc. is confidential and/or legally privileged. The information
> is
> > intended only for the use of the individual or entity named on this
> email..
> > If you are not the intended recipient, you are hereby notified that any
> > disclosure, copying, distribution or taking any action in reliance on the
> > contents of this email information is strictly prohibited, and that the
> > documents should be returned to this office immediately by email. Receipt
> by
> > anyone other than the intended recipient is not a waiver of any
> privilege.
> > Please do not include your social security number, account number, or any
> > other personal or financial information in the content of the email.
> Should
> > you have any questions, please call (800) 453 7884.  =
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message