Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@jackrabbit.apache.org
Received-SPF: pass (nike.apache.org: domain of a.schrijvers@1hippo.com
 designates 64.18.2.155 as permitted sender)
Received-SPF: pass (google.com: domain of a.schrijvers@1hippo.com designates
 10.182.118.34 as permitted sender) client-ip=10.182.118.34;
MIME-Version: 1.0
In-Reply-To: 
 <9C0FC4C8E9C29945B01766FC7F9D389816E3A56CC4@eurmbx01.eur.adobe.com>
References: <4F4527A9.6060507@adobe.com>
	<9C0FC4C8E9C29945B01766FC7F9D389816E3A56CC4@eurmbx01.eur.adobe.com>
Date: Thu, 23 Feb 2012 11:26:19 +0100
Message-ID: 
 <CABXgNGkfqpW6nDoKvnBWUguWh3UVh7jR_+C9ePXfL25g2ZcW6A@mail.gmail.com>
Subject: Re: [jr3 trade consistency for availability]
From: Ard Schrijvers <a.schrijvers@onehippo.com>
To: dev@jackrabbit.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Thu, Feb 23, 2012 at 9:09 AM, Marcel Reutegger <mreutegg@adobe.com> wrot=
e:
>
>> - Lock enforcement?
>
> that's definitively a tough one because it depends on repository
> wide state.
>
>> - Query index consistency?
>
> I think consistency is a prerequisite here, otherwise it's quite
> difficult to implement the query functionality. I'd rather

Personally I am a strong advocate of eventual index consistency,
perhaps because I just cannot see how it can ever be implemented
without putting hard constraints on performance.

But before discussing the details, what is to be understand by 'query
index consistency'?

Does this mean that the indexes should be consistent with the latest
persisted data. Thus within a single cluster node, after a persist,
the index must be updated directly? Would this mean that it blocks new
search requests until the indexing queue is emptied?  Or that an index
should be consistent in a cluster? The latter isn't the case for
jackrabbit 2 any way, right?

Which one of constraints above is considered to be taken into account
doesn't really matter afaics, as I don't see either one of them could
be implemented efficiently...at least, with Lucene in the back of my
head. And even if then all the effort is done and all the burden is
accepted of creating index consistency, then we still don't have
transactional searches, so the search results again still could
contain nodes that are removed after the search was executed

I do understand that relaxing the query index consistency most likely
makes it really hard to implement the (specification) query
functionality. But this might also be a result of the specification
itself. I've come to believe over the years, that a generic
hierarchical jcr full text index and queries is a bad idea : In the
end, it just doesn't scale, is extremely complex to build (Lucene is
flat), and even worse, it doesn't seem to satisfy customers/developers
in the end: They want to index and search *their* specific model they
store in jackrabbit. You can tweak a bit with indexing_configuration
kind of things, but in the end, I think a (Lucene) index is just to
domain specific
If you need a consistent query, because you want to store and query
something like banking accounts, you shouldn't use Jackrabbit (or some
NoSQL db) in the first place imo

Regards Ard

> make compromises for availability. eg. terminate a long query

>
> regards
> =A0marcel
>