lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Using Solr as a Database?
Date Sun, 02 Jun 2019 21:27:09 GMT
Not exactly. If I’m reading this right, you do now, and will continue, to have all the data
in the RDBMS, correct? That’s what I call the “system of record”. So you’re not talking
about getting rid of the RDBMS, rather basically copying it all over in to Solr and periodically
updating your Solr indexes. So at any time, you can throw the Solr cluster away and re-create
it by ingesting from the RDBMS (or whatever data store you settle on).

In that case, storing all your fields in Solr is perfectly reasonable, and SolrCloud will
scale as necessary. There are some practical considerations, mostly having to do with hardware.
You get HA/DR with Solr at the expense of multiple copies of the index etc.

What I and others are saying is that putting all your data in Solr then throwing the RDBMS
(or whatever) away is not a good idea.

Best,
Erick

> On Jun 2, 2019, at 1:32 PM, Ralph Soika <ralph.soika@imixs.com> wrote:
> 
> Thanks Jörn and Erick for your explanations.
> 
> What I do so far is the following:
> 
>  * I have a RDBMS with one totally flatten table holding all the data and the id.
>  * The data is unstructured. Fields can vary from document to document. I have no fixed
schema. A dataset is represented by a Hashmap.
>  * Lucene (7.5) is perfect to index the data - with analysed-fulltext and also with non-analysed-fields.
> 
> The whole system is highly transactional as it runs on Java EE with JPA and Session EJBs.
> I can easily rebuild my index on any time as I have all the data in a RDBMS. And of course
it was necessary in the past to rebuild the index for many projects after upgrading lucene
(e.g. from 4.x to 7.x).
> 
> So, as far as I understand, you recommend to leave the data in the RDBMS?
> 
> The problem with RDBMS is that you can not easily scale over many nodes with a master
less cluster. This was why I thought Solr can solve this problem easily. On the other hand
my Lucene index also did not scale over multiple nodes. Maybe Solr would be a solution to
scale just the index?
> 
> Another solution I am working on is to store all my data in a HA Cassandra cluster because
I do not need the SQL-Core functionallity. But in this case I only replace the RDBMS with
Cassandra and Lucene/Solr holds again only the index.
> 
> So Solr can't improve my architecture, with the exception of the fact that the search
index could be distributed across multiple nodes with Solr. Did I get that right?
> 
> 
> ===
> Ralph
> 
> 
> On 02.06.19 16:35, Erick Erickson wrote:
>> You must be able to rebuild your index completely when, at some point, you change
your schema in incompatible ways. For that reason, either you have to play tricks with Solr
(i.e. store all fields or the original document or….) or somehow have access to the original
document.
>> 
>> Furthermore, starting with Lucene 8, Lucene will not even open an index _ever_ touched
with Lucene 6. In general you can’t even open an index with Lucene X that was ever worked
on with Lucene X-2 (starting where X = 8).
>> 
>> That said, it’s a common pattern to put enough information into Solr that a user
can identify documents that they need then go to the system-of-record for the full document,
whether that is an RDBMS or file system or whatever. I’ve seen lots of hybrid systems that
store additional data besides the id and let the user get to the document she wants and only
when she clicks on a single document go to the system-of-record and fetch it. Think of a Google
search where the information you see as the result of a search is stored in Solr, but when
the user clicks on a link the original doc is fetched from someplace other than Solr.
>> 
>> FWIW,
>> Erick
>> 
>>> On Jun 2, 2019, at 7:05 AM, Jörn Franke <jornfranke@gmail.com> wrote:
>>> 
>>> It depends what you want to do with it. You can store all fields in Solr and
filter on them. However, as soon as it comes to Acid guarantees or if you need to join the
data you will be probably needing something else than Solr (or have other workarounds eg flatten
the table ).
>>> 
>>> Maybe you can describe more what the users do in Solr or in the database.
>>> 
>>>> Am 02.06.2019 um 15:28 schrieb Ralph Soika <ralph.soika@imixs.com>:
>>>> 
>>>> Inspired by an article in the last german JavaMagazin written by Uwe Schindler
I wonder if Solr can also be used as a database?
>>>> 
>>>> In our open source project Imixs-Workflow we use Lucene <https://imixs.org/doc/engine/queries.html>
since several years with great success. We have unstructured document-like data generated
by the workflow engine. We store all the data in a transactional RDBMS into a blob column
and index the data with lucene. This works great and is impressive fast also when we use complex
queries.
>>>> 
>>>> The thing is that we do not store any fields into lucene - only the primary
key of our dataset is stored in lucene. The document data is stored in the SQL database.
>>>> 
>>>> Now as far as I understand is solr a cluster enabled datastore which can
be used to store also all the data form our document.
>>>> The problem with relational databases was always the lack of cloud/cluster
support to get more stable data by using redundancy over serveral nodes.
>>>> 
>>>> What do you think? Is solr an alternative to store and index data instead
of useing Lucene in combination with RDBMS?
>>>> 
>>>> 
>>>> ===
>>>> Ralph
>>>> 
> 


Mime
View raw message