lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gekkokid" ...@gekkokid.org.uk>
Subject Re: Hi Experts
Date Wed, 29 Mar 2006 15:41:56 GMT
Hi,
    Lucene is a component that indexes data and allows you to search that 
indexed data, you need to be able to program in Java(various ports for other 
languages are available) or find a crawler you can adapt  to download the 
required data of the internet (still requires basic knowledge of Java), from 
what I can tell you are wanting (i.e. a tool that downloads files and 
indexes it and allowing you to search it), you should use Nutch, it is a 
Application unlike Lucene which is a software component that interfaces with 
the programmers code to provide a search facility of some sort for their 
application.

_gk

----- Original Message ----- 
From: "Babu, KameshNarayana (GE, Research, consultant)" 
<kameshnarayana.babu@ge.com>
To: <java-user@lucene.apache.org>
Sent: Wednesday, March 29, 2006 11:14 AM
Subject: RE: Hi Experts


> Thanks Aditya,
> Lucene is used only to search in the local machine right? How can lucene 
> search on the internet?
> Do we have any tools which can index on the internet self and displays the 
> results. I know this is very silly.
>
> -----Original Message-----
> From: Aditya Liviandi [mailto:adityal@i2r.a-star.edu.sg]
> Sent: Wednesday, March 29, 2006 11:34 AM
> To: java-user@lucene.apache.org
> Subject: RE: Hi Experts
>
>
> The way lucene works is you need to have the index first.
> Only then you can search it.
>
> So if you want to search within a given URL, you need to somehow create
> the index of all the webpages within that URL. If the webserver linked
> to that URL is also yours, then that would not be a big deal.
>
>
> But if it is an external URL, then you would need to have a crawler
> (which basically collects all the linked documents in the URL). However
> you will not be able to get all the documents in the URL (those that are
> not linked by any other document, will not be reached by the crawler,
> unless you manually supply the URL of that document to the crawler,
> otherwise I don't see how you can figure out the existence of that
> document.).
>
>
> --------------------------------------------------- I²R 
> Disclaimer ------------------------------
> This email is confidential and may be privileged.  If you are not the 
> intended recipient, please delete it and notify us immediately. Please do 
> not copy or use it for any purpose, or disclose its contents to any other 
> person. Thank you.
> -------------------------------------------------------------------------------------------------
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message