nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Neufeind <apache....@stefan-neufeind.de>
Subject Re: I can not query myplugin in field category:test
Date Sat, 14 Oct 2006 00:10:19 GMT
Please do share it. I'd appreciate it, and I guess a lot of others as
well. And I bet it could even be enhanced by the community. :-)


Regards,
 Stefan

Ernesto De Santis wrote:
> I did a url-category-indexer.
> 
> It works with a .properties file that map urls writed as regexp and
> categories.
> example:
> 
> http://www.misite.com/videos/.*=videos
> 
> If it seems useful, I can share it.
> 
> Maybe, it could be better config it in a .xml file.
> 
> Regards,
> Ernesto.
> 
> Stefan Neufeind escribió:
>> Alvaro Cabrerizo wrote:
>>  
>>> Have you included a node to describe your new searcher filter into
>>> plugin.xml?
>>>
>>> 2006/10/11, xu nutch <nutchdev@gmail.com>:
>>>    
>>>> I have a question about myplugin for indexfilter and queryfilter.
>>>> Can u Help me !
>>>> -------------------------------------
>>>> MoreIndexingFilter.java in add
>>>> doc.add(new Field("category", "test", false, true, false));
>>>> -------------------------------------
>>>>
>>>> --------------------------------------
>>>>
>>>>
>>>> package org.apache.nutch.searcher.more;
>>>>
>>>> import org.apache.nutch.searcher.RawFieldQueryFilter;
>>>>
>>>> /** Handles "category:" query clauses, causing them to search the
>>>> field indexed by
>>>>  * BasicIndexingFilter. */
>>>> public class CategoryQueryFilter extends RawFieldQueryFilter {
>>>>  public CategoryQueryFilter() {
>>>>    super("category");
>>>>  }
>>>> }
>>>> -----------------------------------------------
>>>> -----------------------------------------------
>>>>
>>>> <property>
>>>>  <name>plugin.includes</name>
>>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>>>
>>>>
>>>>  <description>Regular expression naming plugin directory names to
>>>>  include.  Any plugin not matching this expression is excluded.
>>>>  In any case you need at least include the nutch-extensionpoints
>>>> plugin. By
>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>>  and basic indexing and search plugins.
>>>>  </description>
>>>> </property>
>>>>
>>>> <property>
>>>>  <name>plugin.includes</name>
>>>> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more)</value>
>>>>
>>>>
>>>>  <description>Regular expression naming plugin directory names to
>>>>  include.  Any plugin not matching this expression is excluded.
>>>>  In any case you need at least include the nutch-extensionpoints
>>>> plugin. By
>>>>  default Nutch includes crawling just HTML and plain text via HTTP,
>>>>  and basic indexing and search plugins.
>>>>  </description>
>>>> </property>
>>>> -----------------------------------------------
>>>>
>>>> I use luke to query "category:test" is ok!
>>>> but I use tomcat webstie to query "category:test" ,
>>>> no return result.
>>>>       
>>
>> In case you get the search working:
>> How do you plan to categorize URLs/sites? I'm looking for a solution
>> there, since I didn't yet manage to implement something
>> URL-prefix-filter based to map categories to URLs or so.
>>
>>
>> Regards,
>>  Stefan

Mime
View raw message