jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Functionality to store indexes in database with jackrabbit 2.1.2 or upcoming releases.........
Date Mon, 29 Nov 2010 11:54:20 GMT
On Mon, Nov 29, 2010 at 11:26 AM, Alexander Klimetschek
<aklimets@adobe.com> wrote:
> But what is the use-case for this? Why store a full-text index
> implementation that is totally unrelated to the DB inside a database layer
> that just makes it perform worse, use more disk-space, etc.? It's like
> implementing a database index by storing it in another database...

Exactly! But you miss one crucial thing: First of all, the Lucene
index should be tens of times smaller than it currently is. This is
possible if we make it better configurable. Secondly, performance
isn't worse, as the entire Lucene indexes are kept in memory. But the
crucial part is in scalability: In a clustered setup, you can with
infinispan (formerly jboss cache) have a replicated in memory Lucene
index. This means, only one node in the cluster needs to do the
indexing. The other nodes get it replicated. Now, because it is all in
memory, 2 or 3 cluster nodes can for example be assigned to now and
then flush their (new) in memory segments to a database: This is just
a 'backup' for when the entire cluster goes down. It is not used by
Lucene, only for bootstrapping when starting the cluster. So, this
scenario does add lots of potential; Bringing in a new node in the
cluster is instant. Hibernate with very similar needs as jackrabbit
uses this technique I just described.

As a bonus we might get rid of the database persisted changelog (or
how it is called): This is meant for nodes in a cluster to
a) Evict their caches
b) Index new nodes

(b) is not needed any more as we have index replication.
(a) could be replaced by jms which seems more natural to me.

The only drawback is that the current jr lucene impl does not fit the
InfinispanDirectory (infinispan lucene dir). It is because of the
multi-index and never re-open setup in jr: It was state of the art
against lucene 1.4, but now mostly redundant.

Anyway, in due time we need to pick this up at the dev list

Regards Ard

> Regards,
> Alex
> On 29.11.10 09:37, "Ard Schrijvers" <a.schrijvers@onehippo.com> wrote:
>>On Mon, Nov 29, 2010 at 9:21 AM, Thomas Mueller <mueller@adobe.com> wrote:
>>> Hi,
>>> Jackrabbit currently uses Lucene. According to the Lucene FAQ it should
>>>be possible, but I'm not sure what changes (if any) would be required in
>>Most likely only a database based LuceneDirectory would have to be
>>created. Although the FAQ states that it is possible, I don't think it
>>can ever perform. Once again, I believe in storing segments as a whole
>>in a database to have a persisted lucene index, but, you cannot
>>efficiently access these segments in a database. You then need to keep
>>the index in memory.
>>> I'm not aware of plans to support this feature. Patches are welcome of
>>>course :-)
>>First of all, we would have to do some housekeeping in the existing
>>implementation. The entire 'multi-index' setup, meant for 'near real
>>time searches' is unneeded anymore against newer lucene versions. Also
>>the way properties are indexed should be replaced. Note that these
>>changes will most likely already affect about 100 of the 200 query
>>classes (where quite a lot can be removed as well). So, this is lots
>>of work. After this, preferably, we can move to the 4.x Lucene
>>versions. So, a patch is welcome, but don't think to light about it :)
>>Regards Ard
>>> Regards,
>>> Thomas
>>> From: Atul Kumar Tripathi
>>> Reply-To:
>>> Date: Mon, 29 Nov 2010 05:06:31 +0000
>>> To: "users@jackrabbit.apache.org<mailto:users@jackrabbit.apache.org>"
>>> Subject: Functionality to store indexes in database with jackrabbit
>>>2.1.2 or upcoming releases.........
>>> Hi Guys,
>>> Is this possible to store indexes in database using jackrabbit 2.1.2?
>>> Are there any plans to provide functionality to store indexes in
>>>database with upcoming Jackrabbit Releases?
>>> Thanks in advance.
>>> Thanks & Regards.
>>> Atul Tripathi
>>> Senior Engineer  |  Datacert/Technology.
>>> Virtusa India Pvt. Ltd.
>>> Chennai ATC
>>> Phone:  +91 44 66127000 Ext:  3742 | Mobile:  +91 9940483180
>>> [cid:image001.jpg@01CB8FB1.496525A0]<http://www.virtusa.com/>
>>>[cid:image002.gif@01CB8FB1.496525A0] <http://www.virtusa.com/blog/>
>>>[cid:image003.gif@01CB8FB1.496525A0] <https://twitter.com/VirtusaCorp>
>>> Virtusa was recently ranked and featured in 2010 Deloitte Technology
>>>Fast 500, 2010 Global Services 100, IAOP's 2010 Global Outsourcing 100
>>>sub-list and 2010 FinTech 100 among others.
>>> This message, including any attachments, contains confidential
>>>information intended for a specific individual and purpose, and is
>>>intended for the addressee only. Any unauthorized disclosure, use,
>>>dissemination, copying, or distribution of this message or any of its
>>>attachments or the information contained in this e-mail, or the taking
>>>of any action based on it, is strictly prohibited. If you are not the
>>>intended recipient, please notify the sender immediately by return
>>>e-mail and delete this message.
>>Europe  €  Amsterdam  Oosteinde 11  €  1017 WT Amsterdam  €  +31 (0)20
>>522 4466
>>USA  € San Francisco  185 H Street Suite B  €  Petaluma CA 94952-5100
>>€  +1 (707) 773 4646
>>Canada    €   Montréal  5369 Boulevard St-Laurent  €  Montréal QC H2T
>>1S5  €  +1 (514) 316 8966
>>www.onehippo.com  €  www.onehippo.org  €  info@onehippo.com
> Regards,
> Alex
> --
> Alexander Klimetschek
> Developer // Adobe (Day) // Berlin - Basel

Europe  •  Amsterdam  Oosteinde 11  •  1017 WT Amsterdam  •  +31 (0)20 522 4466
USA  • San Francisco  185 H Street Suite B  •  Petaluma CA 94952-5100
•  +1 (707) 773 4646
Canada    •   Montréal  5369 Boulevard St-Laurent  •  Montréal QC H2T
1S5  •  +1 (514) 316 8966
www.onehippo.com  •  www.onehippo.org  •  info@onehippo.com

View raw message