nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Aristov <alexander.aris...@gmail.com>
Subject Re: Registered plugin never invoked and urls skipped
Date Sun, 10 May 2009 06:08:59 GMT
parse-pligin.xml says which plugin should be invoked for particular mime
type but activation/deactivation is in the nutch-site.xml

Check activated plugins

Best Regards
Alexander Aristov


2009/5/8 Kenan Azam <azam.kenan@gmail.com>

> Thanks Alexander, however, tried that but again the plugin is registered
> but
> not used. The mime-type is html, I had not entered my other plugins in
> parse-plugin.xml but they were still running.
>
> The other thing I don't get is that all urls starting with
> literature/article.do are not being indexed by any of my plugins. Maybe the
> fetching process is somehow scoring them and deciding that they are not
> worth indexing.
>
> I am using boost values so could this be a possibility.
>
>  Again, these urls get fetched but never indexed.
> hadoop.log file shows
>  2009-05-07 14:32:23,048 INFO  fetcher.Fetcher - fetching
>
>
>
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966
>  2009-05-07<
>
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966%0A2009-05-07
> >
> >
> > 14:32:23,049 INFO  fetcher.Fetcher - fetching
> >
> >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196
> >  2009-05-07<
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196%0A2009-05-07
> >14:32:23,051
> > INFO  fetcher.Fetcher - fetching
> >
> >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247
> >  2009-05-07<
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247%0A2009-05-07
> >14:32:23,052
> > INFO  fetcher.Fetcher - fetching
> >
> >
> Thanks, Kenan.
> On Thu, May 7, 2009 at 11:12 PM, Alexander Aristov <
> alexander.aristov@gmail.com> wrote:
>
> > Did you assign mime type to this plugin. What is it?
> >
> > It's in the parse-plugins.xml file. Unless you do that Nutch won't know
> if
> > it should invoke your plugin for processing particular pages.
> >
> >
> > Best Regards
> > Alexander Aristov
> >
> >
> > 2009/5/8 kazam <azam.kenan@gmail.com>
> >
> > >
> > > Hi there,
> > > I am using nutch-0.8.1 and I have 5 custom plugins that I am using. All
> > of
> > > those plugins seem to get used from the logs but one of them is not
> being
> > > used. Also, the urls it was written for are also skipped altogether.
> > >
> > > Here are some pieces from hadoop.log file
> > > 2009-05-07 14:27:41,227 INFO  plugin.PluginRepository - Registered
> > Plugins:
> > > .....
> > > .........
> > > 2009-05-07 14:27:41,228 INFO  plugin.PluginRepository -         Xenbase
> > > Indexer
> > > (index-xenbase)
> > > 2009-05-07 14:27:41,228 INFO  plugin.PluginRepository -         Article
> > > Display
> > > Page Parser (parse-articlePage)
> > >
> > > The last plugin --> parse-articlePage is never used.
> > >
> > > I wrote this plugin to index urls of the type
> > >
> > >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=670
> > >
> > > Again, these urls get fetched but never indexed.
> > > hadoop.log file shows
> > > 2009-05-07 14:32:23,048 INFO  fetcher.Fetcher - fetching
> > >
> > >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966
> > > 2009-05-07<
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966%0A2009-05-07
> >14:32:23,049
> > INFO  fetcher.Fetcher - fetching
> > >
> > >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196
> > > 2009-05-07<
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196%0A2009-05-07
> >14:32:23,051
> > INFO  fetcher.Fetcher - fetching
> > >
> > >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247
> > > 2009-05-07<
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247%0A2009-05-07
> >14:32:23,052
> > INFO  fetcher.Fetcher - fetching
> > >
> > >
> >
> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6223
> > >
> > > Am I missing some configuration, or is there a bug in the plugin, I
> don't
> > > see any exceptions being thrown.
> > >
> > > Thanks for any pointers.
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://www.nabble.com/Registered-plugin-never-invoked-and-urls-skipped-tp23435093p23435093.html
> > > Sent from the Nutch - User mailing list archive at Nabble.com.
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message