nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: "URLFilterChecker" documentation
Date Tue, 13 Dec 2011 16:45:54 GMT
Hi,

Can anyone confirm if this is an issue?

If so I think we should log it before it goes unnoticed.

Thanks

Lewis

On Fri, Dec 9, 2011 at 3:21 PM, Lewis John Mcgibbney
<lewis.mcgibbney@gmail.com> wrote:
> If you look at the output I posted, even when I specified a particular
> filter, the checkAll() method is still getting called, as is indicated by
> the "Checking combination of all URLFilters available" log output. It's not
> a particularly complex class, so hopefully if we can confirm this is a bug
> we can fix it quickly.
>
> Finally, I must ask, Remi which URL filters have you included in your
> plugin.includes property in nutch-site.xml after building Nutch?
>
> On Fri, Dec 9, 2011 at 3:11 PM, Lewis John Mcgibbney
> <lewis.mcgibbney@gmail.com> wrote:
>>
>> Hi Remi & Markus,
>>
>> Yeah, I can replicate this, good catch Remi.
>>
>> lewis@lewis-desktop:~/ASF/trunk/runtime/local$ bin/nutch
>> org.apache.nutch.net.URLFilterChecker http://www.heraldscotland.com
>> -filterName regex-urlfilter.txt
>>
>> Checking combination of all URLFilters available
>> ^Z
>> [2]+  Stopped                 bin/nutch
>> org.apache.nutch.net.URLFilterChecker http://www.heraldscotland.com
>> -filterName regex-urlfilter.txt
>> lewis@lewis-desktop:~/ASF/trunk/runtime/local$ bin/nutch
>> org.apache.nutch.net.URLFilterChecker http://www.heraldscotland.com
>> -filterName regex-urlfilter
>>
>> Checking combination of all URLFilters available
>>
>> The first instance was hanging, so was the second. This needs some further
>> investigation I think. Can someone else please confirm before we log this in
>> Jira?
>>
>> Thanks for reporting
>>
>>
>> On Fri, Dec 9, 2011 at 12:53 PM, remi tassing <tassingremi@gmail.com>
>> wrote:
>>>
>>> I fed with URL but it didn't work:
>>>
>>> $ bin/nutch org.apache.nutch.net.URLFilterChecker http://www.google.com
>>> Checking combination of all URLFilters available
>>>
>>> Remi
>>>
>>> On Fri, Dec 9, 2011 at 2:43 PM, Markus Jelsma
>>> <markus.jelsma@openindex.io>wrote:
>>>
>>> > it reads from stdin so you can either type a url followed by enter or
>>> > feed
>>> > from stdin using pipes.
>>> >
>>> > On Friday 09 December 2011 13:32:41 remi tassing wrote:
>>> > > Hello guys,
>>> > >
>>> > > how do you use "org.apache.nutch.net.URLFilterChecker"? It's not
>>> > documented
>>> > > and it always shows me this "Checking combination of all URLFilters
>>> > > available" and then gets stuck.
>>> > >
>>> > > Remi
>>> >
>>> > --
>>> > Markus Jelsma - CTO - Openindex
>>> >
>>>
>>>
>>>
>>> --
>>> Remi Tassing
>>
>>
>>
>>
>> --
>> Lewis
>>
>
>
>
> --
> Lewis
>



-- 
Lewis

Mime
View raw message