Return-Path: Delivered-To: apmail-lucene-nutch-user-archive@www.apache.org Received: (qmail 60216 invoked from network); 14 Oct 2006 00:11:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Oct 2006 00:11:46 -0000 Received: (qmail 99522 invoked by uid 500); 14 Oct 2006 00:11:44 -0000 Delivered-To: apmail-lucene-nutch-user-archive@lucene.apache.org Received: (qmail 99508 invoked by uid 500); 14 Oct 2006 00:11:44 -0000 Mailing-List: contact nutch-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-user@lucene.apache.org Delivered-To: mailing list nutch-user@lucene.apache.org Received: (qmail 99497 invoked by uid 99); 14 Oct 2006 00:11:44 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Oct 2006 17:11:44 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [194.150.178.2] (HELO mail.speedpartner.de) (194.150.178.2) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Oct 2006 17:11:42 -0700 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.speedpartner.de (Postfix) with ESMTP id 8C167B3BC6 for ; Sat, 14 Oct 2006 02:11:21 +0200 (CEST) Received: from mail.speedpartner.de ([127.0.0.1]) by localhost (mail.speedpartner.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iQ4JxdiH8pLA; Sat, 14 Oct 2006 02:11:19 +0200 (CEST) Received: from [192.168.2.219] (4.167.138.193.dsl.static.as34225.net [193.138.167.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.speedpartner.de (Postfix) with ESMTP id D5F0FB3BC2; Sat, 14 Oct 2006 02:11:18 +0200 (CEST) Message-ID: <45302AEB.8090304@stefan-neufeind.de> Date: Sat, 14 Oct 2006 02:10:19 +0200 From: Stefan Neufeind User-Agent: Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 To: nutch-user@lucene.apache.org Subject: Re: I can not query myplugin in field category:test References: <9b4404540610101926k3b82da78p2388d28970cb5b1d@mail.gmail.com> <98d722580610130222l576dc1d0n7c7a878490aebc7c@mail.gmail.com> <452F8274.5000705@stefan-neufeind.de> <45301428.1040600@yahoo.com.ar> In-Reply-To: <45301428.1040600@yahoo.com.ar> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Please do share it. I'd appreciate it, and I guess a lot of others as well. And I bet it could even be enhanced by the community. :-) Regards, Stefan Ernesto De Santis wrote: > I did a url-category-indexer. > > It works with a .properties file that map urls writed as regexp and > categories. > example: > > http://www.misite.com/videos/.*=videos > > If it seems useful, I can share it. > > Maybe, it could be better config it in a .xml file. > > Regards, > Ernesto. > > Stefan Neufeind escribi�: >> Alvaro Cabrerizo wrote: >> >>> Have you included a node to describe your new searcher filter into >>> plugin.xml? >>> >>> 2006/10/11, xu nutch : >>> >>>> I have a question about myplugin for indexfilter and queryfilter. >>>> Can u Help me ! >>>> ------------------------------------- >>>> MoreIndexingFilter.java in add >>>> doc.add(new Field("category", "test", false, true, false)); >>>> ------------------------------------- >>>> >>>> -------------------------------------- >>>> >>>> >>>> package org.apache.nutch.searcher.more; >>>> >>>> import org.apache.nutch.searcher.RawFieldQueryFilter; >>>> >>>> /** Handles "category:" query clauses, causing them to search the >>>> field indexed by >>>> * BasicIndexingFilter. */ >>>> public class CategoryQueryFilter extends RawFieldQueryFilter { >>>> public CategoryQueryFilter() { >>>> super("category"); >>>> } >>>> } >>>> ----------------------------------------------- >>>> ----------------------------------------------- >>>> >>>> >>>> plugin.includes >>>> nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more) >>>> >>>> >>>> Regular expression naming plugin directory names to >>>> include. Any plugin not matching this expression is excluded. >>>> In any case you need at least include the nutch-extensionpoints >>>> plugin. By >>>> default Nutch includes crawling just HTML and plain text via HTTP, >>>> and basic indexing and search plugins. >>>> >>>> >>>> >>>> >>>> plugin.includes >>>> nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-(basic|more)|query-(basic|site|url|more) >>>> >>>> >>>> Regular expression naming plugin directory names to >>>> include. Any plugin not matching this expression is excluded. >>>> In any case you need at least include the nutch-extensionpoints >>>> plugin. By >>>> default Nutch includes crawling just HTML and plain text via HTTP, >>>> and basic indexing and search plugins. >>>> >>>> >>>> ----------------------------------------------- >>>> >>>> I use luke to query "category:test" is ok! >>>> but I use tomcat webstie to query "category:test" , >>>> no return result. >>>> >> >> In case you get the search working: >> How do you plan to categorize URLs/sites? I'm looking for a solution >> there, since I didn't yet manage to implement something >> URL-prefix-filter based to map categories to URLs or so. >> >> >> Regards, >> Stefan