nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From remi tassing <tassingr...@gmail.com>
Subject Re: "URLFilterChecker" documentation
Date Sat, 17 Dec 2011 18:00:00 GMT
It actually works fine!

I accidentally left a "+." at the beginning of regex-urlfilter.txt and only
put "-." at the end.

Thanks to Mark and Lewis!

Remi

On Tuesday, December 13, 2011, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> I get it now ... Duh :0)
>
> Output is fine for me. What is wrong with your results Remi?
>
> On Tue, Dec 13, 2011 at 7:09 PM, remi tassing <tassingremi@gmail.com>
wrote:
>> Pla check Markus's earlier email.on the format. It seems  be working.but
>> the output is still incorrect for me.
>>
>> On Tuesday, December 13, 2011, Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com> wrote:
>>> Heres my output from URLFilterChecker [1]
>>>
>>> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
>>> org.apache.nutch.net.URLFilterChecker -filterName urlfilter-regex
>>> Exception in thread "main" java.lang.RuntimeException: Filter
>>> urlfilter-regex not found.
>>>        at
>> org.apache.nutch.net.URLFilterChecker.checkOne(URLFilterChecker.java:66)
>>>        at
>> org.apache.nutch.net.URLFilterChecker.main(URLFilterChecker.java:126)
>>> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
>>> org.apache.nutch.net.URLFilterChecker -allCombined
>>> Checking combination of all URLFilters available
>>> ^Z
>>> [10]+  Stopped                 bin/nutch
>>> org.apache.nutch.net.URLFilterChecker -allCombined
>>> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
>>> org.apache.nutch.net.URLFilterChecker -filterName RegexURLFilter
>>> Exception in thread "main" java.lang.RuntimeException: Filter
>>> RegexURLFilter not found.
>>>        at
>> org.apache.nutch.net.URLFilterChecker.checkOne(URLFilterChecker.java:66)
>>>        at
>> org.apache.nutch.net.URLFilterChecker.main(URLFilterChecker.java:126)
>>>
>>> I'm noticing three things
>>>
>>> 1) NO reference to a single urlfilter seems to work when appended to
>>> the -filterName parameter e.g. regex-urlfilter, urlfilter-regex,
>>> RegexURLFilter, regex-urlfilter.txt
>>> 2) When no -filterName parameter is passed but a value is passed e.g.
>>> bin/nutch org.apache.nutch.net.URLFilterChecker regex-urlfilter log
>>> output is as follows
>>> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
>>> org.apache.nutch.net.URLFilterChecker regex-urlfilter
>>> Checking combination of all URLFilters available
>>> Therefore it seems to incorrectly skip to the checkAll method then hang!
>>> 3) If the -allCombined parameter is passed the output indiciates that
>>> it does the same as 2) above...
>>>
>>> Can you please check if you are getting the same behaviour Markus? Thank
>> you
>>>
>>> [1]
>>
http://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/net/URLFilterChecker.java
>>>
>>> On Tue, Dec 13, 2011 at 5:06 PM, Markus Jelsma
>>> <markus.jelsma@openindex.io> wrote:
>>>> i see no log output mate :)
>>>>
>>>> On Tuesday 13 December 2011 17:58:36 you wrote:
>>>>> Thanks Markus.
>>>>>
>>>>> Can you look at my log output and inform where I am going wrong
>>>>> please? It seemed to be playing up for me.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Tue, Dec 13, 2011 at 4:53 PM, Markus Jelsma
>>>>>
>>>>> <markus.jelsma@openindex.io> wrote:
>>>>> > I've never seen it hanging and use it weekly.
>>>>> >
>>>>> > On Tuesday 13 December 2011 17:45:54 you wrote:
>>>>> >> Hi,
>>>>> >>
>>>>> >> Can anyone confirm if this is an issue?
>>>>> >>
>>>>> >> If so I think we should log it before it goes unnoticed.
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >> Lewis
>>>>> >>
>>>>> >> On Fri, Dec 9, 2011 at 3:21 PM, Lewis John Mcgibbney
>>>>> >>
>>>>> >> --
> Lewis
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message