lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Question about best way to architect a Solr application with many data sources
Date Wed, 22 Feb 2017 01:20:33 GMT
I'll add that I _guarantee_ you'll want to re-index the data as you
change your schema
and the like. You'll be able to do that much more quickly if the data
is stored locally somehow.

A RDBMS is not necessary however. You could simply store the data on
disk in some format
you could re-read and send to Solr.

Best,
Erick

On Tue, Feb 21, 2017 at 5:17 PM, Dave <hastings.recursive@gmail.com> wrote:
> B is a better option long term. Solr is meant for retrieving flat data, fast, not hierarchical.
That's what a database is for and trust me you would rather have a real database on the end
point.  Each tool has a purpose, solr can never replace a relational database, and a relational
database could not replace solr. Start with the slow model (database) for control/display
and enhance with the fast model (solr) for retrieval/search
>
>
>
>> On Feb 21, 2017, at 7:57 PM, Robert Hume <rhume55@gmail.com> wrote:
>>
>> To learn how to properly use Solr, I'm building a little experimental
>> project with it to search for used car listings.
>>
>> Car listings appear on a variety of different places ... central places
>> Craigslist and also many many individual Used Car dealership websites.
>>
>> I am wondering, should I:
>>
>> (a) deploy a Solr search engine and build individual indexers for every
>> type of web site I want to find listings on?
>>
>> or
>>
>> (b) build my own database to store car listings, and then build services
>> that scrape data from different sites and feed entries into the database;
>> then point my Solr search to my database, one simple source of listings?
>>
>> My concerns are:
>>
>> With (a) ... I have to be smart enough to understand all those different
>> data sources and remove/update listings when they change; while this be
>> harder to do with custom Solr indexers than writing something from scratch?
>>
>> With (b) ... I'm maintaining a huge database of all my listings which seems
>> redundant; google doesn't make a *copy* of everything on the internet, it
>> just knows it's there.  Is maintaining my own database a bad design?
>>
>> Thanks for reading!

Mime
View raw message