lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gekkokid">
Subject Re: Hi Experts
Date Wed, 29 Mar 2006 15:41:56 GMT
    Lucene is a component that indexes data and allows you to search that 
indexed data, you need to be able to program in Java(various ports for other 
languages are available) or find a crawler you can adapt  to download the 
required data of the internet (still requires basic knowledge of Java), from 
what I can tell you are wanting (i.e. a tool that downloads files and 
indexes it and allowing you to search it), you should use Nutch, it is a 
Application unlike Lucene which is a software component that interfaces with 
the programmers code to provide a search facility of some sort for their 


----- Original Message ----- 
From: "Babu, KameshNarayana (GE, Research, consultant)" 
To: <>
Sent: Wednesday, March 29, 2006 11:14 AM
Subject: RE: Hi Experts

> Thanks Aditya,
> Lucene is used only to search in the local machine right? How can lucene 
> search on the internet?
> Do we have any tools which can index on the internet self and displays the 
> results. I know this is very silly.
> -----Original Message-----
> From: Aditya Liviandi []
> Sent: Wednesday, March 29, 2006 11:34 AM
> To:
> Subject: RE: Hi Experts
> The way lucene works is you need to have the index first.
> Only then you can search it.
> So if you want to search within a given URL, you need to somehow create
> the index of all the webpages within that URL. If the webserver linked
> to that URL is also yours, then that would not be a big deal.
> But if it is an external URL, then you would need to have a crawler
> (which basically collects all the linked documents in the URL). However
> you will not be able to get all the documents in the URL (those that are
> not linked by any other document, will not be reached by the crawler,
> unless you manually supply the URL of that document to the crawler,
> otherwise I don't see how you can figure out the existence of that
> document.).
> --------------------------------------------------- I²R 
> Disclaimer ------------------------------
> This email is confidential and may be privileged.  If you are not the 
> intended recipient, please delete it and notify us immediately. Please do 
> not copy or use it for any purpose, or disclose its contents to any other 
> person. Thank you.
> -------------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message