nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prashant_nutch <prashant.pa...@in.v2solutions.com>
Subject Re: Help on Activation of Subcollection at Indexing & searching
Date Fri, 30 Mar 2007 12:59:36 GMT

Thanks for your valuable comment on subcollection,
but still i have some issues, 
1.enabling subcollection in nutch-site.xml mean at time of crawling, can it
is possible if it is on direcly on index (means at searching)
2.in your message can u explain comment like
  subcollection also includes a query plugin

i done steps mentioned by you,
but when i execute command like 

subcollection:<name of subcollection> <word for search>
still i get result 0 hits......
can u explain Subcollection more deeply because our aim is to searching on
specific URL?
is any other way other than subcollection ?






Enis Soztutar wrote:
> 
> prashant_nutch wrote:
>> IS Subcollection useful for specific URL Searching ?
>> How we activate subcollection at indexing and searching time?
>>
>> in conf/subcollection , 
>> if we include our URL in whitelist ,then only we have search on that
>> URLs?
>> command for searching on subcollection
>>
>> Subcollection :< Name of subcollection> < word for specific URL>
>>
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <subcollections>
>> 	<subcollection>
>> 		<name>nutch</name>
>> 		<id>nutch</id>
>> 		<whitelist>
>>                                           
>> http://lucene.apache.org/nutch/
>>                                            http://wiki.apache.org/nutch/
>>                                 </whitelist>
>> 		<blacklist />
>> 	</subcollection>
>> </subcollections>
>>
>> can anybody explain how overall thing should work ?
>> can it is useful for specific URL searching ?(we are using nutch 0.8.1)
>>
>>   
> Subcollection is a very useful way to group a set of urls and then 
> assign a label for them. You can use it to limit searching to certain
> urls.
> 
> You should first enable subcollection in the nutch-site.xml file.
> Then you should add collections to the conf/subcollection.xml file.
> After indexing, the documents with the matched urls should have the 
> subcollection field in the index.
> After that, since subcollection also includes a query plugin, you can do 
> searches like
> 
>       java subcollection:nutch
> 
> To limit the search to the nutch collection. You can consult the readme 
> file in the plugin's directory.
> 
> 
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Help-on-Activation-of-Subcollection-at-Indexing---searching-tf3490590.html#a9752653
Sent from the Nutch - User mailing list archive at Nabble.com.


Mime
View raw message