lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramakrishna <>
Subject RE: [ANNOUNCE] Web Crawler
Date Tue, 16 Jul 2013 04:03:40 GMT
so, There is no way to crawl if they blocked their web-sites to crawl ? I've
one idea, But seems little bit foolish(not works/I've to Modify whole
architecture) still I'm telling, If I use Html-Parser(Jsoup) Instead of
fetcher then? Anyhow Html-parser easily takes all contents of the
web-page.Can i do this.. I think rest of the
parts(segments,updater,indexer,parser) I've to write all these things, I
think it'll(Html-parser) not work with the already existing (parts) if i
replace fetcher with Html-parser.

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message