lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Babu, KameshNarayana \(GE, Research, consultant\)" <kameshnarayana.b...@ge.com>
Subject RE: Hi Experts
Date Wed, 29 Mar 2006 10:14:15 GMT
Thanks Aditya,
Lucene is used only to search in the local machine right? How can lucene search on the internet?
Do we have any tools which can index on the internet self and displays the results. I know
this is very silly.

-----Original Message-----
From: Aditya Liviandi [mailto:adityal@i2r.a-star.edu.sg]
Sent: Wednesday, March 29, 2006 11:34 AM
To: java-user@lucene.apache.org
Subject: RE: Hi Experts


The way lucene works is you need to have the index first.
Only then you can search it.

So if you want to search within a given URL, you need to somehow create
the index of all the webpages within that URL. If the webserver linked
to that URL is also yours, then that would not be a big deal.


But if it is an external URL, then you would need to have a crawler
(which basically collects all the linked documents in the URL). However
you will not be able to get all the documents in the URL (those that are
not linked by any other document, will not be reached by the crawler,
unless you manually supply the URL of that document to the crawler,
otherwise I don't see how you can figure out the existence of that
document.).


--------------------------------------------------- I²R Disclaimer ------------------------------
This email is confidential and may be privileged.  If you are not the intended recipient,
please delete it and notify us immediately. Please do not copy or use it for any purpose,
or disclose its contents to any other person. Thank you.
-------------------------------------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message