lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Liviandi <adit...@i2r.a-star.edu.sg>
Subject RE: Hi Experts
Date Wed, 29 Mar 2006 06:04:09 GMT
The way lucene works is you need to have the index first.
Only then you can search it.

So if you want to search within a given URL, you need to somehow create
the index of all the webpages within that URL. If the webserver linked
to that URL is also yours, then that would not be a big deal.

But if it is an external URL, then you would need to have a crawler
(which basically collects all the linked documents in the URL). However
you will not be able to get all the documents in the URL (those that are
not linked by any other document, will not be reached by the crawler,
unless you manually supply the URL of that document to the crawler,
otherwise I don't see how you can figure out the existence of that
document.).


--------------------------------------------------- I²R Disclaimer ------------------------------
This email is confidential and may be privileged.  If you are not the intended recipient,
please delete it and notify us immediately. Please do not copy or use it for any purpose,
or disclose its contents to any other person. Thank you.
-------------------------------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message