Return-Path: Delivered-To: apmail-hc-httpclient-users-archive@www.apache.org Received: (qmail 94703 invoked from network); 10 Apr 2009 22:56:00 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Apr 2009 22:56:00 -0000 Received: (qmail 82859 invoked by uid 500); 10 Apr 2009 22:55:59 -0000 Delivered-To: apmail-hc-httpclient-users-archive@hc.apache.org Received: (qmail 82780 invoked by uid 500); 10 Apr 2009 22:55:59 -0000 Mailing-List: contact httpclient-users-help@hc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "HttpClient User Discussion" Delivered-To: mailing list httpclient-users@hc.apache.org Received: (qmail 82770 invoked by uid 99); 10 Apr 2009 22:55:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2009 22:55:59 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [192.18.43.132] (HELO sca-es-mail-1.sun.com) (192.18.43.132) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2009 22:55:50 +0000 Received: from fe-sfbay-10.sun.com ([192.18.43.129]) by sca-es-mail-1.sun.com (8.13.7+Sun/8.12.9) with ESMTP id n3AMtT0R001526 for ; Fri, 10 Apr 2009 15:55:29 -0700 (PDT) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; format=flowed; charset=ISO-8859-1 Received: from conversion-daemon.fe-sfbay-10.sun.com by fe-sfbay-10.sun.com (Sun Java(tm) System Messaging Server 7.0-5.01 64bit (built Feb 19 2009)) id <0KHW00800PVTFZ00@fe-sfbay-10.sun.com> for httpclient-users@hc.apache.org; Fri, 10 Apr 2009 15:55:29 -0700 (PDT) Received: from [129.150.212.4] ([unknown] [129.150.212.4]) by fe-sfbay-10.sun.com (Sun Java(tm) System Messaging Server 7.0-5.01 64bit (built Feb 19 2009)) with ESMTPSA id <0KHW00C4WQBXT1E0@fe-sfbay-10.sun.com> for httpclient-users@hc.apache.org; Fri, 10 Apr 2009 15:55:13 -0700 (PDT) Date: Fri, 10 Apr 2009 15:55:09 -0700 From: Rutuja Joshi Subject: Not able to download PDF and PNG files using Httpclient Sender: Rutuja.Joshi@Sun.COM To: httpclient-users@hc.apache.org Message-id: <49DFCE4D.7070002@sun.com> User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) X-Virus-Checked: Checked by ClamAV on apache.org Hello, I am working on a web crawler application and using HttpClient by Apache for the same. I have following issues that I am not able to resolve: (This is my first post and not sure to what extent I can provide the details and ask questions, so please pardon me) 1> Whenever I try to download pdf file using HttpClient, the pdf that gets downloaded is approximately half the size from the one I download using Firefox. Same with png file. Both acrobat and image viewer reject the files saying invalid format. There may be something related to compression etc but how do I find out? I am reading from response as input stream , wrap it around buffered stream and write to file. So basically I am just fetching the raw bytes from the response. If needed, I will provide details log ( I read about wire log, haven;t tried it but if needed I 'll try to produce one and provide you). 2> How do I know if thethe file that I am fetching is the text file or not? For e.g, given that I do not know the file type that I am fetching is there any way to know from the content-type etc what type of file I have fetched? I tried content-type header, its the same for a normal HTML file , a PDF file and also for an image file. 3> Redirects - I have set followredirects = true. I have one URL that upon accessing from Firefox redirects, but using HttpClient it does not. The status code for some reason is 200 (OK), Should this have to be 3XX for the HttpClient to follow redirects? The HTML dump from httpclient is as follows: Redirecting...
This url is deprecated. If your browser doesn't immediately redirect you to the new url, please click the link below:
http://www.feedroom.com/ Thanks in advance! Rutuja --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org For additional commands, e-mail: httpclient-users-help@hc.apache.org