From lucene-user-return-2863-qmlist-jakarta-archive-lucene-user=jakarta.apache.org@jakarta.apache.org Mon Nov 04 13:29:13 2002 Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 64349 invoked from network); 4 Nov 2002 13:29:12 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 4 Nov 2002 13:29:12 -0000 Received: (qmail 18954 invoked by uid 97); 4 Nov 2002 13:30:02 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 18931 invoked by uid 97); 4 Nov 2002 13:30:02 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 18915 invoked by uid 98); 4 Nov 2002 13:30:01 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) To: lucene-user@jakarta.apache.org Subject: Re: Indexing distant web sites X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: ID = b90977e79090a542121691dbad53ce31 Reply-To: friaa@excite.com From: "Friaa Nafaa" MIME-Version: 1.0 X-Sender: friaa@excite.com X-Mailer: PHP Content-Type: multipart/alternative; boundary="EXCITEBOUNDARY_000__33d270d92ce76a6abf237cb9be3f1df2"; Content-Transfer-Encoding: 7bit Cc: Message-Id: <20021104132900.EE0F7299E6@xmxpita.excite.com> Date: Mon, 4 Nov 2002 08:29:00 -0500 (EST) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --EXCITEBOUNDARY_000__33d270d92ce76a6abf237cb9be3f1df2 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Thank you,I was installed this crawler and I run it,but I would like to index the web site and not to list the visited links by the crawler,Is there a way to serch a web page by lucene witch use this crawler for visiting the pages.thanks--- On Mon 11/04, Karl Marx < karl@gan.no > wrote:From: Karl Marx [mailto: karl@gan.no]To: lucene-user@jakarta.apache.orgDate: Mon, 4 Nov 2002 12:31:50 +0100Subject: Re: Indexing distant web sitesAs stated in the official FAQ Lucene doesn't implement a web-crawler, you can however use a self-made crawler or customate a crawler framework like websphinx (http://www-2.cs.cmu.edu/~rcm/websphinx/) to retrieve html documents from a site and then feed them to Lucene.mvh karl řieOn Monday, Nov 4, 2002, at 11:49 Europe/Oslo, Friaa Nafaa wrote:> Hello,is there any way to index web sites by lucene, assuming we know > only the url of the site ? :--&gt;In local use we passe to lucene the > full arborexcence or directory of our site (contain all the documents) > and we begin the indexing operation, but when I would like to index a > distant site on the web... what i do ?For exemple I installed Lucene > on my computer and I would like to index the site : > http://www.excite.com ...Thanks>> _______________________________________________> Join Excite! - http://www.excite.com> The most personalized portal on the Web!--To unsubscribe, e-mail: For additional commands, e-mail: _______________________________________________ Join Excite! - http://www.excite.com The most personalized portal on the Web! --EXCITEBOUNDARY_000__33d270d92ce76a6abf237cb9be3f1df2--