Thanks Grant.
But I have thousands of PDF URLs like this. I have tried around 12 so far.
Can all of them be corrupt?
What can I do about this?
- Yogesh
On 5 November 2010 18:53, Grant Overby <grant@floorsoft.com> wrote:
> I ran the code [2]. The pdf is corrupted by the code as MD5s are different.
> File sizes are identical [1];
>
> 1:
> 11/05/2010 06:47 PM 2,371,050 msb201055.pdf
> 11/05/2010 06:46 PM 2,371,050 My.pdf
>
>
>
> 2:
> package s;
>
> import java.io.FileWriter;
> import java.io.InputStream;
> import java.io.IOException;
> import java.net.URL;
> import java.net.URLConnection;
> import java.net.MalformedURLException;
>
> public class Main
> {
> public static void main(String[] args) throws IOException
> {
> URL url = new URL("
>
> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2947364/pdf/msb201055.pdf?tool=pmcentrez
> ");
>
> URLConnection con = url.openConnection();
>
> InputStream in = con.getInputStream();
>
> FileWriter out = new FileWriter("C:/My.pdf");
>
> int next = 0;
> while ( ( next = in.read() ) != -1 ) {
> out.write(next);
> }
> out.flush();
> out.close();
> in.close();
> }
> }
>
>
>
>
> --
> Grant Overby
> Senior Developer
> FloorSoft, Inc.
>
> Often people, especially computer engineers, focus on the machines. They
> think, "By doing this, the machine will run faster. By doing this, the
> machine will run more effectively. By doing this, the machine will
> something
> something something." They are focusing on machines. But in fact we need to
> focus on humans, on how humans care about doing programming or operating
> the
> application of the machines. We are the masters. They are the slaves. --
> Yukihiro Matsumoto
>
>
>
>
> On Fri, Nov 5, 2010 at 6:45 PM, <Adam@swmc.com> wrote:
>
> > Yogesh,
> >
> > Compare the file size and hash (SHA1, MD5, etc.) of the file you download
> > from your browser with the file that Java downloads. The end of the file
> > may be missing when you download it via Java. I know you said the file
> > size is correct, but is it the *exact* same number of bytes? If so, then
> > the content must be different, and it should just be a matter of running
> > `diff` on the files to see what's going wrong.
> >
> > ----
> > Thanks,
> > Adam
> >
> >
> >
> >
> >
> > From:
> > Yogesh <yogeshp08@gmail.com>
> > To:
> > grant@floorsoft.com
> > Cc:
> > users@pdfbox.apache.org
> > Date:
> > 11/05/2010 15:29
> > Subject:
> > Re: Save URLs to PDFs?
> >
> >
> >
> > Yes. I can download the file through the browser. It works perfectly
> fine.
> >
> > - Yogesh
> >
> >
> >
> > On 5 November 2010 18:25, Grant Overby <grant@floorsoft.com> wrote:
> >
> > > If you download the file through a browser? Does it work then?
> > >
> > >
> > > --
> > > Grant Overby
> > > Senior Developer
> > > FloorSoft, Inc.
> > >
> > > Often people, especially computer engineers, focus on the machines.
> They
> > > think, "By doing this, the machine will run faster. By doing this, the
> > > machine will run more effectively. By doing this, the machine will
> > something
> > > something something." They are focusing on machines. But in fact we
> need
> > to
> > > focus on humans, on how humans care about doing programming or
> operating
> > the
> > > application of the machines. We are the masters. They are the slaves.
> --
> > > Yukihiro Matsumoto
> > >
> > >
> > >
> > >
> > > On Fri, Nov 5, 2010 at 6:18 PM, Yogesh <yogeshp08@gmail.com> wrote:
> > >
> > >> I tried with that, it writes a blank PDF. Though, the file size and
> the
> > >> number of pages is correct (for the new written file)
> > >>
> > >> - Yogesh
> > >>
> > >>
> > >>
> > >>
> > >> On 5 November 2010 18:09, Grant Overby <grant@floorsoft.com> wrote:
> > >>
> > >>> You don't need pdfBox to do this. Below is some rough code that
> allows
> > >>> you
> > >>> to download a file and save it.
> > >>>
> > >>> URLConnection urlConnection = new URL("http://...");
> > >>> InputStream in = urlConnection.getInputStream();
> > >>> FileWriter out = new FileWriter("my.pdf");
> > >>> int next = 0;
> > >>> while ( ( next = in.read() ) != -1 ) out.write(next);
> > >>> //close everything
> > >>>
> > >>> --
> > >>> Grant Overby
> > >>> Senior Developer
> > >>> FloorSoft, Inc.
> > >>>
> > >>> Often people, especially computer engineers, focus on the machines.
> > They
> > >>> think, "By doing this, the machine will run faster. By doing this,
> the
> > >>> machine will run more effectively. By doing this, the machine will
> > >>> something
> > >>> something something." They are focusing on machines. But in fact we
> > need
> > >>> to
> > >>> focus on humans, on how humans care about doing programming or
> > operating
> > >>> the
> > >>> application of the machines. We are the masters. They are the slaves.
> > --
> > >>> Yukihiro Matsumoto
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Nov 5, 2010 at 5:56 PM, Yogesh <yogeshp08@gmail.com>
wrote:
> > >>>
> > >>> > Hi,
> > >>> >
> > >>> > I have PDFs which I can access through URLs. I want to download
and
> > >>> save it
> > >>> > to files. How can I go about it?
> > >>> >
> > >>> > Thanks
> > >>> >
> > >>> > -Yogesh
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
> >
> >
> > - FHA 203b; 203k; HECM; VA; USDA; Conventional
> > - Warehouse Lines; FHA-Authorized Originators
> > - Lending and Servicing in over 45 States
> > www.swmc.com - www.simplehecmcalculator.com Visit
> > www.swmc.com/resources for helpful links on Training, Webinars, Lender
> > Alerts and Submitting Conditions
> > This email and any content within or attached hereto from Sun West
> Mortgage
> > Company, Inc. is confidential and/or legally privileged. The information
> is
> > intended only for the use of the individual or entity named on this
> email..
> > If you are not the intended recipient, you are hereby notified that any
> > disclosure, copying, distribution or taking any action in reliance on the
> > contents of this email information is strictly prohibited, and that the
> > documents should be returned to this office immediately by email. Receipt
> by
> > anyone other than the intended recipient is not a waiver of any
> privilege.
> > Please do not include your social security number, account number, or any
> > other personal or financial information in the content of the email.
> Should
> > you have any questions, please call (800) 453 7884. =
> >
>
|