lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From andrey prokopenko <andrey4...@gmail.com>
Subject on regards to Solr and NoSQL storages integration
Date Wed, 05 Nov 2014 13:52:32 GMT
Greetings Comrades.
There were numerous requests and considerations on using Solr as both
search engine and NoSQL store at the same time.
While being an excellent tool as a search engine, Solr is looking not so
good when it comes to storing documents and various stored fields,
especially with big amount of data. Index quickly grows to unmanageable
sizes. Then, there is ever-coming PITA problem
with partial document update: due to the nature of Lucene/Solr index,
documents can't be updated, they always need to be deleted & inserted.
All in all, Solr desperately need a tight integration with some document
storage, offloading stored fields of the document and
transactionally coupled with search index itself, so that stored field are
at all times synced with the other parts
of the index (terms, doc values etc.).

Unfortunately, unlike Lucene, Solr does not offer full set of distribiuted
transaction API commands, thus seriously complicating this matter. Luckily,
with advent of Solr 4.0 now we have abilitu to create not only the custom
Directory, but also completely tweak the index structure any way we like.
Based on this new feature I've created my custom Directory + custom codec,
integrating Solr with Oracle NoSQL key-value store.
My codec is based on Solr 4.10.1 API and Oracle NoSQL 1.2.1.8 Community
Edition. Fields in NoSQL storage are persisted using primary key, derived
from the document fields. The codec relays stored fields to the NOSQL store
while keeping all other index components in usual file-based storage
layout. The codec has been made with SolrCloud and NoSQL own fault
tolerance usage in mind, hence it's tried to ignore wrote commands to NoSQL
storage if index is being created at replica node which is not a Solr shard
leader currently. First stable version of the codec transparently supports
full index life cycle, includung segment creation, merging and deletion.
Source code and readme, detaling usage instructions for the codec can be
found at github: https://github.com/andrey42/onsqlcodec

I assume, there might be other developers, trying to solve similar
problems, so I'd be interested to hear about similar attempts & issues
encountered while trying to implement such an integration between Solr and
other NoSQL databases.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message