Return-Path: Delivered-To: apmail-lucene-nutch-agent-archive@www.apache.org Received: (qmail 20445 invoked from network); 14 Sep 2005 14:54:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Sep 2005 14:54:35 -0000 Received: (qmail 67292 invoked by uid 500); 14 Sep 2005 14:54:34 -0000 Delivered-To: apmail-lucene-nutch-agent-archive@lucene.apache.org Received: (qmail 67062 invoked by uid 500); 14 Sep 2005 14:54:32 -0000 Mailing-List: contact nutch-agent-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-agent@lucene.apache.org Delivered-To: mailing list nutch-agent@lucene.apache.org Received: (qmail 67036 invoked by uid 99); 14 Sep 2005 14:54:32 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Sep 2005 07:54:32 -0700 X-ASF-Spam-Status: No, hits=1.9 required=10.0 tests=FROM_ENDS_IN_NUMS,MISSING_MIMEOLE,MSGID_FROM_MTA_HEADER,NO_REAL_NAME,PRIORITY_NO_NAME X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [213.158.70.83] (HELO community24.interfree.it) (213.158.70.83) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Sep 2005 07:54:42 -0700 Received: (qmail 13130 invoked by uid 320); 14 Sep 2005 14:54:28 -0000 Date: 14 Sep 2005 14:54:28 -0000 Message-ID: <20050914145428.13129.qmail@community24.interfree.it> Received: from 131.114.11.52,131.114.11.157 (adriano50@interfree.it) by mail.interfree.it with HTTP; Wed Sep 14 16:54:28 2005 X-Originating-IP: [131.114.11.52,131.114.11.157] From: adriano50@interfree.it Reply-To: adriano50@interfree.it To: nutch-user@lucene.apache.org Cc: nutch-agent@lucene.apache.org X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal Subject: crawl-urlfilter.txt X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi, thank you for your hints but I didn' give you the following information: I modified the file crawl-urlfilter.txt in this mode: #start crawl-urlfilter # skip file:, ftp:, & mailto: urls -^(file|ftp|mailto): # skip image and other suffixes we can't yet parse -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|rtf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe)$ # skip URLs containing certain characters as probable queries, etc. [EMAIL PROTECTED] # accept anything else +. #end crawl-urlfilter I started nutch with this line_command : bin/nutch crawl urls -dir /home/paul/nutch-searcher.dir -depth 3 >& crawl.log In the file "urls" there is the url of the following page: TitleOfSite Nutch crawls and fetchs "welcome.html" but doesn't work with MyServlet?menu=1 The servlet "MyServlet?menu=1" shows some links but in the log nutch doesn't fetch any of those links. I hope the question is clear and am looking forward to receiving your answer. Adriano ------------------------------------------------------------------------- Visita http://domini.interfree.it, il sito di Interfree dove trovare soluzioni semplici e complete che soddisfano le tue esigenze in Internet, ecco due esempi di offerte: - Registrazione Dominio: un dominio con 1 MB di spazio disco + 2 caselle email a soli 18,59 euro - MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email a soli 51,13 euro Vieni a trovarci! Lo Staff di Interfree -------------------------------------------------------------------------