manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Crawling behind an ISA proxy (iis 7.5)
Date Sun, 08 Jul 2012 23:46:23 GMT
The patch has been committed and will be part of the 0.6 release.

Karl

On Sun, Jul 8, 2012 at 9:54 AM, Karl Wright <daddywri@gmail.com> wrote:
> Thanks for all of your work on this.  I'll be able to commit this patch
> tonight.
>
>
> Karl
>
> Sent from my Windows Phone
> ________________________________
> From: Jan van Haarst
> Sent: 7/8/2012 6:40 AM
>
> To: user@manifoldcf.apache.org
> Subject: Re: Crawling behind an ISA proxy (iis 7.5)
>
> Dear All,
>
> We are now able to connect to the IIS proxy, thanks to the added logging
> facilities by Karl, we were able to see that this is the fix :
>
> Index:
> connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
> ===================================================================
> ---
> connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
> (revision 1357379)
> +++
> connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/WebcrawlerConnector.java
> (working copy)
> @@ -361,7 +361,7 @@
>        String emailAddress =
> params.getParameter(WebcrawlerConfig.PARAMETER_EMAIL);
>        if (emailAddress == null)
>          throw new ManifoldCFException("Missing email address");
> -      userAgent = "ApacheManifoldCFWebCrawler; "+emailAddress+")";
> +      userAgent = "Mozilla/5.0 (ApacheManifoldCFWebCrawler;
> "+emailAddress+")";
>        from = emailAddress;
>
>        x = params.getParameter(WebcrawlerConfig.PARAMETER_ROBOTSUSAGE);
>
> Yes, this is weird, a proxy shouldn't fail on User-Agent settings, but
> apparently this one does.
> Even Google apparently does this :
> http://www.useragentstring.com/pages/Googlebot/
> Now, we 'just' have to get the crawling working,  but the main (unique)
> hurdle has now been taken !
>
> Karl, a big Thank You for your help, and for the openssl s_client that
> enabled us to debug this.
>
> Dag,
> Jan
>
> On Thu, Jun 28, 2012 at 11:05 PM, Jan van Haarst <jan@vanhaarst.net> wrote:
>>
>> On Thu, Jun 28, 2012 at 11:26 AM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>> I was wondering if you'd picked up and tried the patch for
>>> CONNECTORS-483.  This patch adds official proxy support for the Web
>>> Connector.  Alternatively, you could try to build and run with trunk
>>> code.
>>>
>>> Karl
>>
>>
>> I'm going the building from trunk way, and all seems to go well up to the
>> creation of the zip and tar.gz files.
>> Is there anything special to do after running the build process like this
>> ?
>>
>> ant clean clean-core-deps clean-deps && ant make-core-deps make-deps build
>> && ant image
>>
>> Did I miss anything ?
>> If not, I'll replace the old binary installation with my source-build one,
>> and see where it leads me.
>>
>> --
>> Dag,
>> Jan
>
>
>
>
> --
> Dag,
> Jan

Mime
View raw message