nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prashant_nutch <>
Subject Re: Help on Activation of Subcollection at Indexing & searching
Date Fri, 30 Mar 2007 12:59:36 GMT

Thanks for your valuable comment on subcollection,
but still i have some issues, 
1.enabling subcollection in nutch-site.xml mean at time of crawling, can it
is possible if it is on direcly on index (means at searching) your message can u explain comment like
  subcollection also includes a query plugin

i done steps mentioned by you,
but when i execute command like 

subcollection:<name of subcollection> <word for search>
still i get result 0 hits......
can u explain Subcollection more deeply because our aim is to searching on
specific URL?
is any other way other than subcollection ?

Enis Soztutar wrote:
> prashant_nutch wrote:
>> IS Subcollection useful for specific URL Searching ?
>> How we activate subcollection at indexing and searching time?
>> in conf/subcollection , 
>> if we include our URL in whitelist ,then only we have search on that
>> URLs?
>> command for searching on subcollection
>> Subcollection :< Name of subcollection> < word for specific URL>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <subcollections>
>> 	<subcollection>
>> 		<name>nutch</name>
>> 		<id>nutch</id>
>> 		<whitelist>
>>                                 </whitelist>
>> 		<blacklist />
>> 	</subcollection>
>> </subcollections>
>> can anybody explain how overall thing should work ?
>> can it is useful for specific URL searching ?(we are using nutch 0.8.1)
> Subcollection is a very useful way to group a set of urls and then 
> assign a label for them. You can use it to limit searching to certain
> urls.
> You should first enable subcollection in the nutch-site.xml file.
> Then you should add collections to the conf/subcollection.xml file.
> After indexing, the documents with the matched urls should have the 
> subcollection field in the index.
> After that, since subcollection also includes a query plugin, you can do 
> searches like
>       java subcollection:nutch
> To limit the search to the nutch collection. You can consult the readme 
> file in the plugin's directory.

View this message in context:
Sent from the Nutch - User mailing list archive at

View raw message