lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Will Solr/Lucene crawl multi websites (aka a mini google with faceted search)?
Date Mon, 12 Sep 2011 01:05:32 GMT
Nope, there's nothing in Solr that crawls anything, you have to feed
documents in yourself from the websites.

Or, look at the Nutch project, see: http://nutch.apache.org/about.html

which is designed for this kind of problem.

Best
Erick

On Sun, Sep 11, 2011 at 8:53 PM, dpt9876 <daninthetropics@gmail.com> wrote:
> Hi all,
> I am wondering if Solr will do the following for a project I am working on.
> I want to create a search engine with facets for potentially hundreds of
> websites.
> Similar to say crawling amazon + buy.com + ebay and someone can search these
> 3 sites from my 1 website.
> (I realise there are better ways of doing the above example, its for
> illustrative purposes).
> Eventually I would build that search crawl to index say 200 or 1000
> merchants.
> Someone would come to my site and search for "digital camera".
>
> They would get results from all 3 indexes and hopefully dynamic facets eg
> Price $100-200
> Price 200-300
> Resolution 1mp-2mp
>
> etc etc
>
> Can this be done on the fly?
>
> I ask this because I am currently developing webscrapers to crawl these
> websites, dump that data into a db, then was thinking of tacking on a solr
> server to crawl my db.
>
> Problem with that approach is that crawling the worlds ecommerce sites will
> take forever, when it seems solr might do that for me? (I have read about
> multiple indexes etc).
>
> Many thanks
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Will-Solr-Lucene-crawl-multi-websites-aka-a-mini-google-with-faceted-search-tp3328314p3328314.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
View raw message