jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Elastic Search and OAK comparisons
Date Wed, 18 Dec 2013 09:16:29 GMT

On 17 December 2013 22:43, Reza Jalili <jalili@adobe.com> wrote:
> Forwarding to the open group
>>Hi Toby,
>>I've just started to take a look at elasticsearch.org / .com
>>Do you know:
>>How does oak compare with elasticsearch open source
>>search/data store?

Elastic search is only a distributed elastic search index based on
Lucene, so comparing it with Oak as a whole is not  a like for like
comparison. It is not a data store.

Many large applications especially in the OpenData field have used it
as a data store since its resilience to unforeseen failures is high
mainly due to:
* close to real time with a data update latency often around 50ms
between update and availability in the index.
* replication and sharding with no single point of failure
* write ahead log on write giving it automated recovery.
* True elasticity.

The datastore that results from an elastic search deployment can be
considered as a flat datastore with no inherent structure and no
versioning. ie billions of documents in a bucket.

If you were brave, you could write a EasticSearchMK.

>>What are the dimensions and features that are fair to compare and

It would be fair to compare the SolrCloud component of a full Oak
deployment with ElasticSearch.

You will find differences in schema support, replication mechanism,
deployment and indexing.

Solr has schema capabilities, ES has none.
SolrCloud replicates segment data, ES replicates the index update
commands after committing to a Write Ahead Log.
SolrCloud requires several components including Zookeeper as a HA
cluster. ES is a single jar that self discovers peers and has no
single book keeping instance.
SolrCloud will index documents (pdf etc). ES indexes keywords and
streams of tokens leaving you to perform the conversion from document
to token.

Lucene indexes stored in Oak (as mentioned below) is reminiscent of
earlier work that lead to ElasticSearch. There are some talks on the
ElasticSearch site that describe the issues with making Lucene based
indexes scale.

It would not be a like for like comparison to compare all of Oak with
ElasticSearch as they are very different beasts.


(I am not a core Oak contributor, but have had experience using ES and
SolrCloud in the past)

Best Regards

>>Thanks for your help,
>>On 12/17/13 1:46 AM, "Tommaso Teofili" <teofili@adobe.com> wrote:
>>>IIRC the Lucene index data is stored under /oak:index/lucene/:data

View raw message