nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zo tiger <zo.ti...@hotmail.com>
Subject Re: Help me, No urls to fetch.
Date Mon, 07 Sep 2009 10:31:19 GMT

Oh, i resolved it. Nutch is runned. Great.

I forgot copy all conf file to other slave nodes.

I only setted config files on the master node but not all slave nodes.

thanks for help of Paul Tomblin , MilleBii and 皮皮.

Very thank you.


MilleBii wrote:
> 
> Obviously you've checked crawl-filter.txt rules.
> Beware there is a nasty thing that can happen : make sure there is a
> direct
> CR/LF at the end of the rules, I had recently a problem because some
> "invisible" spaces where following one rule and therefore this rule was
> never matching... took me a while to figure out.
> 
> 
> 2009/9/7 zo tiger <zo.tiger@hotmail.com>
> 
>>
>> This is my hadoop.log file's contents
>>
>>
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         HTTP
>> Framework (lib-http)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Text
>> Parse
>> Plug-in (parse-text)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -
>> Pass-through
>> URL Normalizer (urlnormalizer-pass)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Filter (urlfilter-regex)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Http
>> Protocol Plug-in (protocol-http)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         XML
>> Response
>> Writer Plug-in (response-xml)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Normalizer (urlnormalizer-regex)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         OPIC
>> Scoring
>> Plug-in (scoring-opic)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         CyberNeko
>> HTML Parser (lib-nekohtml)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Anchor
>> Indexing Filter (index-anchor)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -        
>> JavaScript
>> Parser (parse-js)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         URL Query
>> Filter (query-url)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Regex URL
>> Filter Framework (lib-regex-filter)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         JSON
>> Response Writer Plug-in (response-json)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository - Registered
>> Extension-Points:
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Summarizer (org.apache.nutch.searcher.Summarizer)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Protocol (org.apache.nutch.protocol.Protocol)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>> 2009-09-07 03:32:58,137 INFO  plugin.PluginRepository -         Nutch
>> Field
>> Filter (org.apache.nutch.indexer.field.FieldFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         HTML
>> Parse
>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch
>> Query
>> Filter (org.apache.nutch.searcher.QueryFilter)
>> 2009-09-07 03:32:58,138 INFO  plugin.PluginRepository -         Nutch
>> Search
>> Results Response Writer
>> (org.apache.nutch.searcher.response.ResponseWriter)
>>
>>
>> MilleBii wrote:
>> >
>> > Is there more information in logs/hadoop file ?
>> >
>> > What is your plug-in list ?
>> >
>> > 2009/9/2 zo tiger <zo.tiger@hotmail.com>
>> >
>> >>
>> >> Thank you for your reply.
>> >>
>> >> In urls directory(exactly /nutch/search/urls) , there is a file
>> >> urllist.txt.
>> >>
>> >> content is as following.
>> >>
>> >>      http://lucene.apache.org
>> >>
>> >> I don't understand why nutch can not fetch any url.
>> >>
>> >>
>> >> Paul Tomblin wrote:
>> >> >
>> >> > On Wed, Sep 2, 2009 at 6:36 AM, zo tiger<zo.tiger@hotmail.com>
>> wrote:
>> >> >>
>> >> >
>> >> >> At last i ran bin/nutch crawl command but it gives
>> >> >>
>> >> >> No urls to fetch check your filter and seed list error
>> >> >>
>> >> >> I am sure there is no problem in crawl-url filter and other
>> >> configuration
>> >> >> xml files
>> >> >>
>> >> >> İs anyone know any possible problem????
>> >> >>
>> >> >
>> >> > What's in your url directory?
>> >> >
>> >> >
>> >> > --
>> >> > http://www.linkedin.com/in/paultomblin
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25255944.html
>> >> Sent from the Nutch - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> > --
>> > -MilleBii-
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25324884.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> -MilleBii-
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-me%2C-No-urls-to-fetch.-tp25255142p25328368.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Mime
View raw message