Thanks for this hint.
So I propably should think about handling this search without lucene and use jcr-api for it or combine it (for the tagging information which should be correct and should be used since there is no real navigation since the coupling for this way of tagging is based on lose references).
I did hope I could use the lucene index for all of it to have a better performance but I think it's worth trying it the other way - now I have to figure out how to combine "my" score with the lucene score.

Best regards,

On Fri, Jun 5, 2009 at 9:24 PM, Alexander Klimetschek <> wrote:
No, the search will work, because the path information is not stored in the lucene index - hence no reindex is needed upon a move - and path location steps are handled without the lucene index.


Alexander Klimetschek @iPhone

Am 05.06.2009 um 11:58 schrieb Dominik Süß <>:

Hi Marcel,

doesn't that mean I never can be sure I'll get a proper result when searching for the path of a node?

Best regards,

On Tue, Jun 2, 2009 at 1:43 PM, Marcel Reutegger <> wrote:

2009/5/21 Dominik Süß <>:
> Hi everybody,
> after having some time of indirect contact with JCR throught sling and day
> crx/cq I now think it's time to get in touch with jackrabbit directly. As
> the subject says I do this after having an idea which I'd like to share and
> need some help to realize (since my lucene experiences are close to nothing
> but pure usage & theory). I did try to start with a proof of concept but as
> I looked in the current implementations of search in jcr I had to realize I
> need someone who could give me a jumpstart and does the first steps together
> with me. So here I go with my idea:
> I recently had some thoughts about something I'd call sementic distance in
> multidimensional hierachies (content structures + hierarchical tagging like
> in CQ 5 [1]).
> The task I would like to fullfill: Find the semantically closest nodes for a
> given node.
> I postulate that structure represents the semantic relation as well as the
> referenced tags are in a hierarchie that represents semantic relations.
> Furthermore I postulate subnodes are semanticaly a subset of the "type" of
> the parentnode (not thinking of jcr-types but in semantical classifications)
> This leads into the following thesis: The distance to the closest shared
> parentnode represents the unidirectional distance of a node to another node.
> The result is that a whole branch has the same distance to a node. (which
> should be correct since the subnode in the branch belongs to the parent node
> which connects the branches we have to look at).
> My try to figure out a good way to produce an index for this really seams to
> be hard so I rethought my assumptions and came up with the following way of
> determining the distance without indexing the explicit distance (came up
> with this thought after reading a bit about the Analyzers and Stemming).
> 1. For indexing all referenced taghandles and the own handle will be taken
> into account for indexing
> 2. an analyzer produces stringtokens out of each handle. Each handle will be
> split up in multiple handles by removing the last node till the rootnode is
> reached (so the node and every parentnode is indexed for this node as well
> as for each referenced tag)

this will only work as long as you don't move nodes. moving a node in
jackrabbit is a light weight operation, which means only the moved
node is re-indexed. all descendant nodes are kept untouched even
though their path (handle) changed!


> 3. The query has to built based on a given handle since I want to search for
> the semantically closest nodes.
> 4. The query is built the same way as the Analyzer has to split the handle
> in all parent handles.
> Result: A 100% match can only be produced for the same node (for all other
> nodes at least the own handle of the node is missing). The "semantically"
> closer a node is the more handles will match wich will result in an ordering
> as I intended. Et Voilá we have all we need to search for search
> semantically close pages in a proper sorting order.
> I might have a gap in my conclusions but didn't realise it yet, Id love to
> have some feedback and would appreciate some help to get startet with the
> mentioned proof of concept.
> Best regards,
> Dominik
> [1]