jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Parvulescu (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (JCR-2906) Multivalued property sorted by last/random value
Date Thu, 24 Nov 2011 09:50:41 GMT

     [ https://issues.apache.org/jira/browse/JCR-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alex Parvulescu updated JCR-2906:
---------------------------------

    Attachment: JCR-2906.patch

Really good analysis, thanks for pointing out where the problem is!

The problem is not that the JCR spec may or may not define sorting on a multi-valued property.
the problem is the sort behavior is not stable when dealing with MVPs.

Like Paul correctly pointed out, whenever there is a MVP present, the value in the cache gets
overwritten by the last value found by the lucene Term query. So in fact an MVP is represented
in the sort by just one of its values (which can apparently change at runtime - that is easily
reproducible by running the attached test a few times).

The solution is to use the position info that comes via lucene's TermPositions. This does
contain the term's position within the current document allowing us to use it as an index
for MVPs.
The downside is that the Comparables have to support arrays as well as simple values, so I've
added a class (ComparableArray) that simply delegates compareTo calls to the inner array of
Comparables. This way all the sql languages (xpath&sql&sql2) have similar sort for
MVPs.



Attaching patch.

                
> Multivalued property sorted by last/random value
> ------------------------------------------------
>
>                 Key: JCR-2906
>                 URL: https://issues.apache.org/jira/browse/JCR-2906
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: indexing
>    Affects Versions: 2.2
>         Environment: Windows 7, Sun JDK 1.6.0_23
>            Reporter: Paul Lysak
>              Labels: multivalued, sort
>         Attachments: JCR-2906.patch
>
>
> Sorting on multivalued property may produce incorrect result because sorting is performed
only by last value of multivalued property.
> Steps to reproduce:
> 1. Create multivalued field in repository. Example from nodetypes file:
> <propertyDefinition name="MyProperty" requiredType="String" autoCreated="false" mandatory="false"
>    onParentVersion="COPY" protected="false" multiple="false">
> 2. Create few records so that all records except one would contain single value for MyProperty
and one record would contain 
> first value which is greater then of any other record and the second value is somewhere
in the middle. Here is an example:
> 1st record: "aaaa"
> 2nd record: "cccc"
> 3rd record: "dddd", "bbbb"
> 3. Run some query which sorts Example of XPath query:
> //*[...here are some criteria...] order by @MyProperty ascending
> The query would return documents in such order:
> "aaaa"
> "dddd", "bbbb"
> "cccc"
> which is not expected order (expected same order as they were entered - as "aaaa" <
"cccc", "cccc" < "dddd")
> After some digging I found out that it happens because method 
> org.apache.jackrabbit.core.query.lucene.SharedFieldCache.getValueIndex
> (called from org.apache.jackrabbit.core.query.lucene.SharedFieldSortComparator.SimpleScoreDocComparator
constructor)
> returns only last Comparable of the document. Here is overwrites previous value:
> retArray[termDocs.doc()] = getValue(value, type);
> I tried to concatenate comparables (just to check if it would work for my case):
> 	if(retArray[termDocs.doc()] == null) {
> 		retArray[termDocs.doc()] = getValue(value, type);
> 	} else {
> 		retArray[termDocs.doc()] =
> 				retArray[termDocs.doc()] + " " + getValue(value, type);
> 	}
> But it didn't worked well either - TermEnum returns terms not in the same order as JackRabbit
returns values of multivalued field
> (as an example ["qwer", "asdf"] may become ["asdf", "qwer"] ). So, simple concatenation
doesn't help.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message