jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clay Ferguson <wcl...@gmail.com>
Subject Re: Custom index type
Date Fri, 14 Oct 2016 16:28:47 GMT
The "Traversed 210000 nodes" warning is really telling you that it was
unable to use your indexes to perform the search. (I think) This doesn't
mean too many results were found, it just means you didn't create all the
right indexes for the search. Just create an index for each property, and
then search them the normal way (without the LIKE clause, but using '=')
and I bet you will see good performance.  If you genuinely have thousands
of key/value pairs to search it is possible that your full-text approach is
the best performing solution, but I'm not sure.

However your general question is: "What's the best way to query for LARGE
NUMBERS of key/value pairs?"

Maybe some experts who know more than me about Oak can reply to that
simplified version of your question.

Best regards,
Clay Ferguson

On Fri, Oct 14, 2016 at 10:29 AM, rachna <rachana.mehta@telegraph.co.uk>

> Thanks Clay & Thomas.
> Taking a step back from our problem has helped to look at it in a different
> way.
> The tag property also stores the values in a specific format that show the
> tree structure.
> cq:tags
> - location:europe
> - type:waterfalls
> Therefore instead of traversing the repository to identify the descendants
> of these tags, we could use a LIKE query.
> [/content/guides]) AND ([cq:tags] LIKE 'location:europe%' OR [cq:tags] LIKE
> 'type:waterfalls%') ORDER BY [cq:lastModified]
> However, since our repository contains a large number of items that match
> this criteria, we start to see warnings about traversing the index.
> org.apache.jackrabbit.oak.plugins.index.property.strategy.
> ContentMirrorStoreStrategy
> Traversed 210000 nodes (210164 index entries) using index jcr:primaryType
> with filter Filter(query=SELECT * FROM [cq:PageContent] AS b WHERE
> ISDESCENDANTNODE(b, [/content/guides]) AND ([cq:tags] LIKE
> 'location:europe%' OR [cq:tags] LIKE 'type:waterfalls%') ORDER BY
> [cq:lastModified], path=/content/guides//*, property=[cq:tags=[is not
> null]])
> Instead, I created a lucene index that indexes the cq:tags (/w full text)
> and cq:lastModified (/w ordered support) property.
> e.g. SELECT [jcr:path] FROM [cq:PageContent] AS b WHERE ISDESCENDANTNODE(b,
> [/content/guides]) AND (CONTAINS([cq:tags], 'location:europe') OR
> CONTAINS([cq:tags], 'type:waterfalls')) ORDER BY [cq:lastModified]
> That seems to be much faster than using a property index and should solve
> most of the issues that we might have (hopefully avoiding creating a new
> index).
> Is there any support with the lucene index to use something like STARTSWITH
> rather CONTAINS?
> The maxClauseCount configuration parameter introduced the soft limit of
> 1024
> which is part of Jackrabbit 2.
> We have been attempting to move to oak however our progress has been slow
> due to repository inconsistencies.
> I realise this value is configurable however constantly increasing it
> doesn't sound the right thing to do.
> Thanks,
> Rachna
> --
> View this message in context: http://jackrabbit.510166.n4.
> nabble.com/Custom-index-type-tp4665031p4665121.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message