doesn't that mean I never can be sure I'll get a proper result when searching for the path of a node?
DominikOn Tue, Jun 2, 2009 at 1:43 PM, Marcel Reutegger <firstname.lastname@example.org> wrote:
2009/5/21 Dominik Süß <email@example.com>:
this will only work as long as you don't move nodes. moving a node in> Hi everybody,
> after having some time of indirect contact with JCR throught sling and day
> crx/cq I now think it's time to get in touch with jackrabbit directly. As
> the subject says I do this after having an idea which I'd like to share and
> need some help to realize (since my lucene experiences are close to nothing
> but pure usage & theory). I did try to start with a proof of concept but as
> I looked in the current implementations of search in jcr I had to realize I
> need someone who could give me a jumpstart and does the first steps together
> with me. So here I go with my idea:
> I recently had some thoughts about something I'd call sementic distance in
> multidimensional hierachies (content structures + hierarchical tagging like
> in CQ 5 ).
> The task I would like to fullfill: Find the semantically closest nodes for a
> given node.
> I postulate that structure represents the semantic relation as well as the
> referenced tags are in a hierarchie that represents semantic relations.
> Furthermore I postulate subnodes are semanticaly a subset of the "type" of
> the parentnode (not thinking of jcr-types but in semantical classifications)
> This leads into the following thesis: The distance to the closest shared
> parentnode represents the unidirectional distance of a node to another node.
> The result is that a whole branch has the same distance to a node. (which
> should be correct since the subnode in the branch belongs to the parent node
> which connects the branches we have to look at).
> My try to figure out a good way to produce an index for this really seams to
> be hard so I rethought my assumptions and came up with the following way of
> determining the distance without indexing the explicit distance (came up
> with this thought after reading a bit about the Analyzers and Stemming).
> 1. For indexing all referenced taghandles and the own handle will be taken
> into account for indexing
> 2. an analyzer produces stringtokens out of each handle. Each handle will be
> split up in multiple handles by removing the last node till the rootnode is
> reached (so the node and every parentnode is indexed for this node as well
> as for each referenced tag)
jackrabbit is a light weight operation, which means only the moved
node is re-indexed. all descendant nodes are kept untouched even
though their path (handle) changed!
> 3. The query has to built based on a given handle since I want to search for
> the semantically closest nodes.
> 4. The query is built the same way as the Analyzer has to split the handle
> in all parent handles.
> Result: A 100% match can only be produced for the same node (for all other
> nodes at least the own handle of the node is missing). The "semantically"
> closer a node is the more handles will match wich will result in an ordering
> as I intended. Et Voilá we have all we need to search for search
> semantically close pages in a proper sorting order.
> I might have a gap in my conclusions but didn't realise it yet, Id love to
> have some feedback and would appreciate some help to get startet with the
> mentioned proof of concept.
> Best regards,
>  http://dev.day.com/microsling/content/blogs/main/cq5tags.html