lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Hume <rhum...@gmail.com>
Subject Question about best way to architect a Solr application with many data sources
Date Wed, 22 Feb 2017 00:57:13 GMT
To learn how to properly use Solr, I'm building a little experimental
project with it to search for used car listings.

Car listings appear on a variety of different places ... central places
Craigslist and also many many individual Used Car dealership websites.

I am wondering, should I:

(a) deploy a Solr search engine and build individual indexers for every
type of web site I want to find listings on?

or

(b) build my own database to store car listings, and then build services
that scrape data from different sites and feed entries into the database;
then point my Solr search to my database, one simple source of listings?

My concerns are:

With (a) ... I have to be smart enough to understand all those different
data sources and remove/update listings when they change; while this be
harder to do with custom Solr indexers than writing something from scratch?

With (b) ... I'm maintaining a huge database of all my listings which seems
redundant; google doesn't make a *copy* of everything on the internet, it
just knows it's there.  Is maintaining my own database a bad design?

Thanks for reading!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message