jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-8166) Index definition with orderable property definitions with and without functions breaks index
Date Thu, 04 Apr 2019 14:31:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808720#comment-16808720
] 

Thomas Mueller edited comment on OAK-8166 at 4/4/19 2:30 PM:
-------------------------------------------------------------

[~tmueller] , [~catholicon] - the field names are not colliding here . Refer below values
of documents with and without issue .

{noformat}
*// with issue* 
 Document<stored,indexed,omitNorms,indexOptions=DOCS_ONLY<:path:/test>
 docValueType=SORTED<:*dvjcr:content/n0/testOak*:[74 65 73 74]> 
 docValueType=SORTED<:*dvjcr:content/n0/testOak*:[74 65 73 74]>
 docValueType=SORTED<:*dvfunction*upper*@jcr:content/n0/testOak*:[54 45 53 54]>
 indexed,omitNorms,indexOptions=DOCS_ONLY<*function*upper*@jcr:content/n0/testOak*:TEST>>

*//without issuev (not with the fix but Removed ordered from prop def with function*
 Document<stored,indexed,omitNorms,indexOptions=DOCS_ONLY<:path:/test> 
 docValueType=SORTED<:*dvjcr:content/n0/testOak*:[74 65 73 74]> 
 indexed,omitNorms,indexOptions=DOCS_ONLY<*function*upper@jcr:content/n0/testOak*:TEST*>>
{noformat}

So The problem is not with the field names for function and non functions instances of the
property def being same .

As I mentioned in my last comment - the problem is that in case of the current scenario -
the field dvjcr:content/n0/testOak gets added twice because of the flow that I described .

It doesn't impact non-relative properties because [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/IndexDefinition.java#L1179#L1198]
- in case of non relative properties - the propAggregate list here will be empty (because
of checks at line 1183 and line 1197) - and because of which the matchers list will be empty
and the fields would finally be added to the doc via this code - [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L138]
(for field_name(ordered(name))) and [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L150]
for field_name(ordered(function(name)))

In case of relative properties - the code block at [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L138] 
doesn;t comes into play . and [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L146]
- this executes the flow wherein onResult is called twice due to reasons mentioned above and
adds field_name(ordered(name))) twice .

[https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L150]
adds field_name(ordered(function(name))) as usual and as expected .

I am not sure if I clarified things or made it more confusing . Maybe we can discuss it on
call tomorrow



was (Author: nitigup):
[~tmueller] , [~catholicon] - the field names are not colliding here . Refer below values
of documents with and without issue .

{noformat}
*// with issue* 
 Document<stored,indexed,omitNorms,indexOptions=DOCS_ONLY<:path:/test> docValueType=SORTED<:*dvjcr:content/n0/testOak*:[74
65 73 74]> docValueType=SORTED<:*dvjcr:content/n0/testOak*:[74 65 73 74]> docValueType=SORTED<:*dvfunction*upper*@jcr:content/n0/testOak*:[54
45 53 54]> indexed,omitNorms,indexOptions=DOCS_ONLY<*function*upper*@jcr:content/n0/testOak*:TEST>>

*//without issuev (not with the fix but Removed ordered from prop def with function*
 Document<stored,indexed,omitNorms,indexOptions=DOCS_ONLY<:path:/test> docValueType=SORTED<:*dvjcr:content/n0/testOak*:[74
65 73 74]> indexed,omitNorms,indexOptions=DOCS_ONLY<*function*upper@jcr:content/n0/testOak*:TEST*>>
{noformat}

So The problem is not with the field names for function and non functions instances of the
property def being same .

As I mentioned in my last comment - the problem is that in case of the current scenario -
the field dvjcr:content/n0/testOak gets added twice because of the flow that I described .

It doesn't impact non-relative properties because [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/IndexDefinition.java#L1179#L1198]
- in case of non relative properties - the propAggregate list here will be empty (because
of checks at line 1183 and line 1197) - and because of which the matchers list will be empty
and the fields would finally be added to the doc via this code - [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L138]
(for field_name(ordered(name))) and [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L150]
for field_name(ordered(function(name)))

In case of relative properties - the code block at [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L138] 
doesn;t comes into play . and [https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L146]
- this executes the flow wherein onResult is called twice due to reasons mentioned above and
adds field_name(ordered(name))) twice .

[https://github.com/apache/jackrabbit-oak/blob/trunk/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/editor/FulltextDocumentMaker.java#L150]
adds field_name(ordered(function(name))) as usual and as expected .

I am not sure if I clarified things or made it more confusing . Maybe we can discuss it on
call tomorrow


> Index definition with orderable property definitions with and without functions breaks
index
> --------------------------------------------------------------------------------------------
>
>                 Key: OAK-8166
>                 URL: https://issues.apache.org/jira/browse/OAK-8166
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.8.12
>            Reporter: Tom Blackford
>            Priority: Major
>         Attachments: OAK-8166_1.patch
>
>
> If an index definition contains the same orderable property with and without functions,
it will fail to index any node which contains that property. The failure will be logged as
[1].
> Steps to reproduce:
> * Configure index with the two property definitions shown at [2].
> * Refresh the index definition
> * Modify a node that falls under the definition - it will fail with the exception shown
at [1]
> * Modify the 'non-function' index definition to not be orderable (orderable=false)
> * Refresh the index definition
> * Modify the same node - note there is no exception.
> Thanks to [~catholicon] for assistance identifying root cause.
> [1]
> {code}
> 25.03.2019 15:39:04.135 *WARN* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor
Failed to index the node [/content/dam/Unknown-2.png]
> java.lang.IllegalArgumentException: DocValuesField ":dvjcr:content/metadata/dc:title"
appears more than once in this document (only one value is allowed per field)
> 	at org.apache.lucene.index.SortedDocValuesWriter.addValue(SortedDocValuesWriter.java:62)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.lucene.index.DocValuesProcessor.addSortedField(DocValuesProcessor.java:125)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.lucene.index.DocValuesProcessor.addField(DocValuesProcessor.java:59) [org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.lucene.index.TwoStoredFieldsConsumers.addField(TwoStoredFieldsConsumers.java:36)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:236)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:455)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1534) [org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507) [org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.writer.DefaultIndexWriter.updateDocument(DefaultIndexWriter.java:86)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.addOrUpdate(LuceneIndexEditor.java:258)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:140)
[org.apache.jackrabbit.oak-lucene:1.8.9]
> 	at org.apache.jackrabbit.oak.spi.commit.CompositeEditor.leave(CompositeEditor.java:74)
[org.apache.jackrabbit.oak-store-spi:1.8.9]
> {code}
> [2] 
> {code}
> "dcTitle": {
>     "jcr:primaryType": "nt:unstructured",
>     "nodeScopeIndex": "true",
>     "useInSuggest": "true",
>     "ordered": "true",
>     "propertyIndex": "true",
>     "useInSpellcheck": "true",
>     "name": "jcr:content/metadata/dc:title",
>     "boost": "2.0"
>     },
>   "dcTitleLowercase": {
>     "jcr:primaryType": "nt:unstructured",
>     "ordered": "true",
>     "propertyIndex": "true",
>     "function": "fn:lower-case(jcr:content/metadata/@dc:title)"
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message