lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com>
Subject Re: A way to download URLs and index better ?
Date Sat, 16 Jan 2010 07:26:26 GMT
> Hi everyone, please help me this
> question:
> I need downloading some webpages from a list of URLs (about
> 200 links) and
> then index them by Lucene.
> This list is not fixed, because it depends on definition of
> my process.
> Currently, in my web application, I wrote class for
> downloading, but it
> download time is too long.
> 
> Please recommend me a Java library suitable with my
> situation for optimize
> downloading.
> More its examples are very wonderful (INPUT: list of URLs;
> OUTPUT: webpages
> content, or indexed repository)
> Thank you very much.

Probably most famous ones :

http://lucene.apache.org/nutch/
http://crawler.archive.org/



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message