I get it now ... Duh :0) Output is fine for me. What is wrong with your results Remi? On Tue, Dec 13, 2011 at 7:09 PM, remi tassing wrote: > Pla check Markus's earlier email.on the format. It seems to be working.but > the output is still incorrect for me. > > On Tuesday, December 13, 2011, Lewis John Mcgibbney < > lewis.mcgibbney@gmail.com> wrote: >> Heres my output from URLFilterChecker [1] >> >> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch >> org.apache.nutch.net.URLFilterChecker -filterName urlfilter-regex >> Exception in thread "main" java.lang.RuntimeException: Filter >> urlfilter-regex not found. >>        at > org.apache.nutch.net.URLFilterChecker.checkOne(URLFilterChecker.java:66) >>        at > org.apache.nutch.net.URLFilterChecker.main(URLFilterChecker.java:126) >> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch >> org.apache.nutch.net.URLFilterChecker -allCombined >> Checking combination of all URLFilters available >> ^Z >> [10]+  Stopped                 bin/nutch >> org.apache.nutch.net.URLFilterChecker -allCombined >> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch >> org.apache.nutch.net.URLFilterChecker -filterName RegexURLFilter >> Exception in thread "main" java.lang.RuntimeException: Filter >> RegexURLFilter not found. >>        at > org.apache.nutch.net.URLFilterChecker.checkOne(URLFilterChecker.java:66) >>        at > org.apache.nutch.net.URLFilterChecker.main(URLFilterChecker.java:126) >> >> I'm noticing three things >> >> 1) NO reference to a single urlfilter seems to work when appended to >> the -filterName parameter e.g. regex-urlfilter, urlfilter-regex, >> RegexURLFilter, regex-urlfilter.txt >> 2) When no -filterName parameter is passed but a value is passed e.g. >> bin/nutch org.apache.nutch.net.URLFilterChecker regex-urlfilter log >> output is as follows >> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch >> org.apache.nutch.net.URLFilterChecker regex-urlfilter >> Checking combination of all URLFilters available >> Therefore it seems to incorrectly skip to the checkAll method then hang! >> 3) If the -allCombined parameter is passed the output indiciates that >> it does the same as 2) above... >> >> Can you please check if you are getting the same behaviour Markus? Thank > you >> >> [1] > http://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/net/URLFilterChecker.java >> >> On Tue, Dec 13, 2011 at 5:06 PM, Markus Jelsma >> wrote: >>> i see no log output mate :) >>> >>> On Tuesday 13 December 2011 17:58:36 you wrote: >>>> Thanks Markus. >>>> >>>> Can you look at my log output and inform where I am going wrong >>>> please? It seemed to be playing up for me. >>>> >>>> Thanks >>>> >>>> On Tue, Dec 13, 2011 at 4:53 PM, Markus Jelsma >>>> >>>> wrote: >>>> > I've never seen it hanging and use it weekly. >>>> > >>>> > On Tuesday 13 December 2011 17:45:54 you wrote: >>>> >> Hi, >>>> >> >>>> >> Can anyone confirm if this is an issue? >>>> >> >>>> >> If so I think we should log it before it goes unnoticed. >>>> >> >>>> >> Thanks >>>> >> >>>> >> Lewis >>>> >> >>>> >> On Fri, Dec 9, 2011 at 3:21 PM, Lewis John Mcgibbney >>>> >> >>>> >> wrote: >>>> >> > If you look at the output I posted, even when I specified a > particular >>>> >> > filter, the checkAll() method is still getting called, as is > indicated >>>> >> > by the "Checking combination of all URLFilters available" log > output. >>>> >> > It's not a particularly complex class, so hopefully if we can > confirm >>>> >> > this is a bug we can fix it quickly. >>>> >> > >>>> >> > Finally, I must ask, Remi which URL filters have you included in > your >>>> >> > plugin.includes property in nutch-site.xml after building Nutch? >>>> >> > >>>> >> > On Fri, Dec 9, 2011 at 3:11 PM, Lewis John Mcgibbney >>>> >> > >>>> >> > wrote: >>>> >> >> Hi Remi & Markus, >>>> >> >> >>>> >> >> Yeah, I can replicate this, good catch Remi. >>>> >> >> >>>> >> >> lewis@lewis-desktop:~/ASF/trunk/runtime/local$ bin/nutch >>>> >> >> org.apache.nutch.net.URLFilterChecker > http://www.heraldscotland.com >>>> >> >> -filterName regex-urlfilter.txt >>>> >> >> >>>> >> >> Checking combination of all URLFilters available >>>> >> >> ^Z >>>> >> >> [2]+  Stopped                 bin/nutch >>>> >> >> org.apache.nutch.net.URLFilterChecker > http://www.heraldscotland.com >>>> >> >> -filterName regex-urlfilter.txt >>>> >> >> lewis@lewis-desktop:~/ASF/trunk/runtime/local$ bin/nutch >>>> >> >> org.apache.nutch.net.URLFilterChecker > http://www.heraldscotland.com >>>> >> >> -filterName regex-urlfilter >>>> >> >> >>>> >> >> Checking combination of all URLFilters available >>>> >> >> >>>> >> >> The first instance was hanging, so was the second. This needs some >>>> >> >> further investigation I think. Can someone else please confirm > before >>>> >> >> we log this in Jira? >>>> >> >> >>>> >> >> Thanks for reporting >>>> >> >> >>>> >> >> >>>> >> >> On Fri, Dec 9, 2011 at 12:53 PM, remi tassing < > tassingremi@gmail.com> >>>> >> >> >>>> >> >> wrote: >>>> >> >>> I fed with URL but it didn't work: >>>> >> >>> >>>> >> >>> $ bin/nutch org.apache.nutch.net.URLFilterChecker >>>> >> >>> http://www.google.com Checking combination of all URLFilters >>>> >> >>> available >>>> >> >>> >>>> >> >>> Remi >>>> >> >>> >>>> >> >>> On Fri, Dec 9, 2011 at 2:43 PM, Markus Jelsma >>>> >> >>> >>>> >> >>> wrote: >>>> >> >>> > it reads from stdin so you can either type a url followed by > enter >>>> >> >>> > or feed >>>> >> >>> > from stdin using pipes. >>>> >> >>> > >>>> >> >>> > On Friday 09 December 2011 13:32:41 remi tassing wrote: >>>> >> >>> > > Hello guys, >>>> >> >>> > > >>>> >> >>> > > how do you use "org.apache.nutch.net.URLFilterChecker"? It's > not >>>> >> >>> > >>>> >> >>> > documented >>>> >> >>> > >>>> >> >>> > > and it always shows me this "Checking combination of all >>>> >> >>> > > URLFilters available" and then gets stuck. >>>> >> >>> > > >>>> >> >>> > > Remi >>>> >> >>> > >>>> >> >>> > -- >>>> >> >>> > Markus Jelsma - CTO - Openindex >>>> >> >>> >>>> >> >>> -- >>>> >> >>> Remi Tassing >>>> >> >> >>>> >> >> -- >>>> >> >> Lewis >>>> >> > >>>> >> > -- >>>> >> > Lewis >>>> > >>>> > -- >>>> > Markus Jelsma - CTO - Openindex >>> >>> -- >>> Markus Jelsma - CTO - Openindex >> >> >> >> -- >> Lewis >> -- Lewis