Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BB2A1042D for ; Wed, 18 Dec 2013 09:17:00 +0000 (UTC) Received: (qmail 99043 invoked by uid 500); 18 Dec 2013 09:16:57 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 99014 invoked by uid 500); 18 Dec 2013 09:16:56 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 98935 invoked by uid 99); 18 Dec 2013 09:16:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Dec 2013 09:16:55 +0000 X-ASF-Spam-Status: No, hits=0.3 required=5.0 tests=FRT_ADOBE2,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ianboston@gmail.com designates 209.85.160.41 as permitted sender) Received: from [209.85.160.41] (HELO mail-pb0-f41.google.com) (209.85.160.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Dec 2013 09:16:49 +0000 Received: by mail-pb0-f41.google.com with SMTP id jt11so8239542pbb.0 for ; Wed, 18 Dec 2013 01:16:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=k7M6iScrm3C3U5oz64ZR6wA7TflgE2Z+IvIZwJBpNFs=; b=qyVrvvmXr8CmsbYnslf79gkiI+Jk9cdqLs4BumD0Tvd+Wpx9E8s6OEHgf7noWseCWo zRmMOLpL7lHdup9YGyZMKLRUyccClcGsCepSc3rTADiz6BHvC6XVDvWT7uSTs/9yIcsa NMHAqIubrAvTtqP2BvNleSOB+Nqym36HazXv9j4doWxYuNDArNXm/TrOx8ZM1wg6gNNp NvsarI4JFxsM+mo0WZG2KpITFj/WuZ2tspt3GMRUen/4iIl3bSFg+8UHOJGLvJa+xgd9 +UxluhXJ51gSBHvmHQohU4YO2oT0I8P4dsGJMVgZrKH+0EwKnokN6N3CLqH/87/FmBcK qstg== MIME-Version: 1.0 X-Received: by 10.68.194.97 with SMTP id hv1mr19235108pbc.162.1387358189379; Wed, 18 Dec 2013 01:16:29 -0800 (PST) Sender: ianboston@gmail.com Received: by 10.69.14.225 with HTTP; Wed, 18 Dec 2013 01:16:29 -0800 (PST) In-Reply-To: References: Date: Wed, 18 Dec 2013 09:16:29 +0000 X-Google-Sender-Auth: Rl2lEY6hJzmtt9Zy-jeBGRHGgqI Message-ID: Subject: Re: Elastic Search and OAK comparisons From: Ian Boston To: "oak-dev@jackrabbit.apache.org" Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, On 17 December 2013 22:43, Reza Jalili wrote: > Forwarding to the open group > > >>Hi Toby, >> >>I've just started to take a look at elasticsearch.org / .com >> >>Do you know: >>How does oak compare with elasticsearch open source >>search/data store? Elastic search is only a distributed elastic search index based on Lucene, so comparing it with Oak as a whole is not a like for like comparison. It is not a data store. However: Many large applications especially in the OpenData field have used it as a data store since its resilience to unforeseen failures is high mainly due to: * close to real time with a data update latency often around 50ms between update and availability in the index. * replication and sharding with no single point of failure * write ahead log on write giving it automated recovery. * True elasticity. The datastore that results from an elastic search deployment can be considered as a flat datastore with no inherent structure and no versioning. ie billions of documents in a bucket. If you were brave, you could write a EasticSearchMK. > >>What are the dimensions and features that are fair to compare and >>understand? It would be fair to compare the SolrCloud component of a full Oak deployment with ElasticSearch. You will find differences in schema support, replication mechanism, deployment and indexing. Solr has schema capabilities, ES has none. SolrCloud replicates segment data, ES replicates the index update commands after committing to a Write Ahead Log. SolrCloud requires several components including Zookeeper as a HA cluster. ES is a single jar that self discovers peers and has no single book keeping instance. SolrCloud will index documents (pdf etc). ES indexes keywords and streams of tokens leaving you to perform the conversion from document to token. Lucene indexes stored in Oak (as mentioned below) is reminiscent of earlier work that lead to ElasticSearch. There are some talks on the ElasticSearch site that describe the issues with making Lucene based indexes scale. It would not be a like for like comparison to compare all of Oak with ElasticSearch as they are very different beasts. HTH (I am not a core Oak contributor, but have had experience using ES and SolrCloud in the past) Best Regards Ian >> >>Thanks for your help, >>-reza >> >> >> >> >>On 12/17/13 1:46 AM, "Tommaso Teofili" wrote: >> >>>IIRC the Lucene index data is stored under /oak:index/lucene/:data >>> >>>Regards, >>>Tommaso >