lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Hume <>
Subject Question about best way to architect a Solr application with many data sources
Date Wed, 22 Feb 2017 00:57:13 GMT
To learn how to properly use Solr, I'm building a little experimental
project with it to search for used car listings.

Car listings appear on a variety of different places ... central places
Craigslist and also many many individual Used Car dealership websites.

I am wondering, should I:

(a) deploy a Solr search engine and build individual indexers for every
type of web site I want to find listings on?


(b) build my own database to store car listings, and then build services
that scrape data from different sites and feed entries into the database;
then point my Solr search to my database, one simple source of listings?

My concerns are:

With (a) ... I have to be smart enough to understand all those different
data sources and remove/update listings when they change; while this be
harder to do with custom Solr indexers than writing something from scratch?

With (b) ... I'm maintaining a huge database of all my listings which seems
redundant; google doesn't make a *copy* of everything on the internet, it
just knows it's there.  Is maintaining my own database a bad design?

Thanks for reading!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message