nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: "URLFilterChecker" documentation
Date Fri, 09 Dec 2011 15:21:59 GMT
If you look at the output I posted, even when I specified a particular
filter, the checkAll() method is still getting called, as is indicated by
the "Checking combination of all URLFilters available" log output. It's not
a particularly complex class, so hopefully if we can confirm this is a bug
we can fix it quickly.

Finally, I must ask, Remi which URL filters have you included in your
plugin.includes property in nutch-site.xml after building Nutch?

On Fri, Dec 9, 2011 at 3:11 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Remi & Markus,
>
> Yeah, I can replicate this, good catch Remi.
>
> lewis@lewis-desktop:~/ASF/trunk/runtime/local$ bin/nutch
> org.apache.nutch.net.URLFilterChecker http://www.heraldscotland.com-filterName regex-urlfilter.txt
>
> Checking combination of all URLFilters available
> ^Z
> [2]+  Stopped                 bin/nutch
> org.apache.nutch.net.URLFilterChecker http://www.heraldscotland.com-filterName regex-urlfilter.txt
> lewis@lewis-desktop:~/ASF/trunk/runtime/local$ bin/nutch
> org.apache.nutch.net.URLFilterChecker http://www.heraldscotland.com-filterName regex-urlfilter
>
> Checking combination of all URLFilters available
>
> The first instance was hanging, so was the second. This needs some further
> investigation I think. Can someone else please confirm before we log this
> in Jira?
>
> Thanks for reporting
>
>
> On Fri, Dec 9, 2011 at 12:53 PM, remi tassing <tassingremi@gmail.com>wrote:
>
>> I fed with URL but it didn't work:
>>
>> $ bin/nutch org.apache.nutch.net.URLFilterChecker http://www.google.com
>> Checking combination of all URLFilters available
>>
>> Remi
>>
>> On Fri, Dec 9, 2011 at 2:43 PM, Markus Jelsma <markus.jelsma@openindex.io
>> >wrote:
>>
>> > it reads from stdin so you can either type a url followed by enter or
>> feed
>> > from stdin using pipes.
>> >
>> > On Friday 09 December 2011 13:32:41 remi tassing wrote:
>> > > Hello guys,
>> > >
>> > > how do you use "org.apache.nutch.net.URLFilterChecker"? It's not
>> > documented
>> > > and it always shows me this "Checking combination of all URLFilters
>> > > available" and then gets stuck.
>> > >
>> > > Remi
>> >
>> > --
>> > Markus Jelsma - CTO - Openindex
>> >
>>
>>
>>
>> --
>> Remi Tassing
>>
>
>
>
> --
> *Lewis*
>
>


-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message