lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Friaa Nafaa" <fr...@excite.com>
Subject Re: Indexing distant web sites
Date Mon, 04 Nov 2002 13:29:00 GMT
 
Thank you,I was installed this crawler and I run it,but I would like to index the web site
and not to list the visited links by the crawler,Is there a way to serch a web page by lucene
witch use this crawler for visiting the pages.thanks--- On Mon 11/04, Karl Marx &lt; karl@gan.no
&gt; wrote:From: Karl Marx [mailto: karl@gan.no]To: lucene-user@jakarta.apache.orgDate:
Mon, 4 Nov 2002 12:31:50 +0100Subject: Re: Indexing distant web sitesAs stated in the official
FAQ Lucene doesn't implement a web-crawler, you can however use a self-made crawler or customate
a crawler framework like websphinx (http://www-2.cs.cmu.edu/~rcm/websphinx/) to retrieve html
documents from a site and then feed them to Lucene.mvh karl řieOn Monday, Nov 4, 2002, at
11:49 Europe/Oslo, Friaa Nafaa wrote:&gt; Hello,is there any way to index web sites by
lucene, assuming we know &gt; only the url of the site ? :--&amp;gt;In local use we
passe to lucene the &gt; full arborexcence or directory of our site (contain all the documents)
&gt; and we begin the indexing operation, but when I would like to index a &gt; distant
site on the web... what i do ?For exemple I installed Lucene &gt; on my computer and I
would like to index the site : &gt; http://www.excite.com ...Thanks&gt;&gt; _______________________________________________&gt;
Join Excite! - http://www.excite.com&gt; The most personalized portal on the Web!--To
unsubscribe, e-mail: For additional commands, e-mail: 

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message