jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject Equality and ordering conditions JSR 170 and JSR 283 and JR current implementation regarding index
Date Mon, 06 Aug 2007 11:46:50 GMT
Hello,

I have some questions / doubts about the use of equality and ordering of nodes/properties
in the current JSR 170 or 283. IIUC, you can configure that you have orderableChildNodes.
I suppose this ordering is stored in the db or FS, depending on what you are using to persist
your data. 

Now, AFAICS, despite the fact that you did not set orderableChildNodes, you can still query
nodes and order them by some node/property (by the UN_TOKENIZED lucene field). IIUC, also
equality in an XPATH or SQL query is done by the lucene index.

>From JSR-283 4.6.2 I do understand that according the last sentence "Support of equality
and order comparison of BINARY values is not required", support for equality and order *is*
required for not binary values. The current JR implementation therefore 'indexes' (UN_TOKENIZED)
the stringValue of *every* property as one single lucene term in the index (See NodeIndexer
addStringValue). But, IMHO, who wants to order on the text body of document, or do an equal
with string comparison on the body of a text? Ordering and equality is done on things like
author and date, not on some document contents. 

So, IMO, it would be better for the specification to allow for configuration that indicates
orderable or equality is possible for a property. If this is not possible, I think we might
need to alter the current jackrabbit implementation to enable configuration for properties
"how" to implement equality and ordering. The reason here for is that if I have representable
data, with for example about 10 properties per document, of which one is "body" (~10 kb),
1/3 of the index consists of *never* used UN_TOKENIZED (= lucene single 99.9999% sure unique
term) *body* property. This really is a waste. If the JSR is reluctant regarding configurable
equality, we could store for larger values in lucene a term that is some checksum(), though,
we then have no 100% garantueed equality then, which is probably pretty undesirable. 

My preference would be (easy to achieve because I already implemented it locally) is to enable
equality/ordering set to false in the upcoming 1.4 IndexingConfiguration [1]. Then, you can
just configure the body property for example to not be added to the index as UN_TOKENIZED.


WDOT?

Regards Ard

[1] http://wiki.apache.org/jackrabbit/IndexingConfiguration


-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
a.schrijvers@hippo.nl / ard@apache.org / http://www.hippo.nl
-------------------------------------------------------------- 

Mime
View raw message