manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject RE: Timeout problems with web crawling
Date Tue, 23 Apr 2013 12:18:00 GMT
Do you have the ability to use wireshark or tcpdump on this machine? If
so, can you set up a crawl with only that URL, and compare and contrast
fetches vs. Curl? There must be some key difference.

Karl

Sent from my Windows Phone
From: Erlend Garåsen
Sent: 4/23/2013 8:03 AM
To: user@manifoldcf.apache.org
Subject: Re: Timeout problems with web crawling
On 23.04.13 13.48, Erlend Garåsen wrote:

> -bash-3.2$ curl -vvv -H "User-Agent: Mozilla/5.0
> (ApacheManifoldCFWebCrawler; sok-core@usit.uio.no)"
> "http://www.ibsen.uio.no/REGINFO_peAGa.xhtml?bokstav=G|1366644879398+299979"

A small typo in the URL, so the correct command is:
curl -vvv -H "User-Agent: Mozilla/5.0 (ApacheManifoldCFWebCrawler;
sok-core@usit.uio.no)"
"http://www.ibsen.uio.no/REGINFO_peAGa.xhtml?bokstav=G"

But same result. An immediate response.

Erlend

-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Mime
View raw message