nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susam Pal <susam....@gmail.com>
Subject Re: Proxy Authentication
Date Fri, 12 Mar 2010 09:47:18 GMT
On Fri, Mar 12, 2010 at 2:09 PM, Graziano Aliberti
<graziano.aliberti@eng.it> wrote:
> Il 11/03/2010 16.20, Susam Pal ha scritto:
>>
>> On Thu, Mar 11, 2010 at 8:24 PM, Graziano Aliberti
>> <graziano.aliberti@eng.it>  wrote:
>>
>>>
>>> Hi everyone,
>>>
>>> I'm trying to use nutch ver. 1.0 on a system under squid proxy control.
>>> When
>>> I try to fetch my website list, into the log file I see that the
>>> authentication was failed...
>>>
>>> I've configured my nutch-site.xml file with all that properties needed
>>> for
>>> proxy auth, but my error is "httpclient.HttpMethodDirector - No
>>> credentials
>>> available for BASIC 'Squid proxy-caching web
>>> server'@proxy.my.host:my.port"
>>>
>>>
>>
>> Did you replace 'protocol-http' with 'protocol-httpclient' in the
>> value for 'plugins.include' property in 'conf/nutch-site.xml'?
>>
>> Regards,
>> Susam Pal
>>
>>
>>
>
> Hi Susam,
>
> yes of course!! :) Maybe I can post you the configuration file:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
> <property>
> <name>http.agent.name</name>
> <value>my.agent.name</value>
> <description>
> </description>
> </property>
>
> <property>
> <name>plugin.includes</name>
> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
> <description>
> </description>
> </property>
>
> <property>
> <name>http.auth.file</name>
> <value>my_file.xml</value>
> <description>Authentication configuration file for
>  'protocol-httpclient' plugin.
> </description>
> </property>
>
> <property>
> <name>http.proxy.host</name>
> <value>ip.my.proxy</value>
> <description>The proxy hostname.  If empty, no proxy is used.</description>
> </property>
>
> <property>
> <name>http.proxy.port</name>
> <value>my.port</value>
> <description>The proxy port.</description>
> </property>
>
> <property>
> <name>http.proxy.username</name>
> <value>my.user</value>
> <description>
> </description>
> </property>
>
> <property>
> <name>http.proxy.password</name>
> <value>my.pwd</value>
> <description>
> </description>
> </property>
>
> <property>
> <name>http.proxy.realm</name>
> <value>my_realm</value>
> <description>
> </description>
> </property>
>
> <property>
> <name>http.agent.host</name>
> <value>my.local.pc</value>
> <description>The agent host.</description>
> </property>
>
> <property>
> <name>http.useHttp11</name>
> <value>true</value>
> <description>
> </description>
> </property>
>
> </configuration>
>
> Only another question: where i must put the user authentication parameters
> (user,pwd)? In nutch-site.xml file or in my_file.xml that I use for
> authentication?
>
> Thank you for your attention,
>
>
> --
> -----------
>
> Graziano Aliberti
>
> Engineering Ingegneria Informatica S.p.A
>
> Via S. Martino della Battaglia, 56 - 00185 ROMA
>
> *Tel.:* 06.49.201.387
>
> *E-Mail:* graziano.aliberti@eng.it
>
>
>

The configuration looks okay to me. Yes, the proxy authentication
details are set in 'conf/nutch-site.xml'. The file mentioned in
'http.auth.file' property is used for configuring authentication
details for authenticating to a web server.

Unfortunately, there aren't any log statements in the part of the code
that reads the proxy authentication details. So, I can't suggest you
to turn on debug logs to get some clues about the issue. However, in
case you want to troubleshoot it yourself by building Nutch from
source, I can tell you the code that deals with this.

The file is: src/java/org/apache/nutch/protocol/httpclient/Http.java :
http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java?view=markup

The line number is: 200.

If I get time this weekend, I will try to insert some log statements
into this code and send a modified JAR file to you which might help
you to troubleshoot what is going on. But I can't promise this since
it depends on my weekend plans.

Two questions before I end this mail. Did you set the value of
'http.proxy.realm' property as: Squid proxy-caching web server ?

Also, do you see any 'auth.AuthChallengeProcessor' lines in the log
file? I'm not sure whether this line should appear for proxy
authentication but it does appear for web server authentication.

Regards,
Susam Pal

Mime
View raw message