jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Klimetschek <aklim...@day.com>
Subject Re: Semantic distance search
Date Fri, 05 Jun 2009 19:24:48 GMT
No, the search will work, because the path information is not stored  
in the lucene index - hence no reindex is needed upon a move - and  
path location steps are handled without the lucene index.

Regards,
Alex

--
Alexander Klimetschek @iPhone


Am 05.06.2009 um 11:58 schrieb Dominik Süß <dominik.suess@gmail.com>:

> Hi Marcel,
>
> doesn't that mean I never can be sure I'll get a proper result when  
> searching for the path of a node?
>
> Best regards,
> Dominik
>
> On Tue, Jun 2, 2009 at 1:43 PM, Marcel Reutegger <marcel.reutegger@gmx.net 
> > wrote:
> Hi,
>
> 2009/5/21 Dominik Süß <dominik.suess@gmail.com>:
> > Hi everybody,
> >
> > after having some time of indirect contact with JCR throught sling  
> and day
> > crx/cq I now think it's time to get in touch with jackrabbit  
> directly. As
> > the subject says I do this after having an idea which I'd like to  
> share and
> > need some help to realize (since my lucene experiences are close  
> to nothing
> > but pure usage & theory). I did try to start with a proof of  
> concept but as
> > I looked in the current implementations of search in jcr I had to  
> realize I
> > need someone who could give me a jumpstart and does the first  
> steps together
> > with me. So here I go with my idea:
> >
> > I recently had some thoughts about something I'd call sementic  
> distance in
> > multidimensional hierachies (content structures + hierarchical  
> tagging like
> > in CQ 5 [1]).
> >
> > The task I would like to fullfill: Find the semantically closest  
> nodes for a
> > given node.
> >
> > I postulate that structure represents the semantic relation as  
> well as the
> > referenced tags are in a hierarchie that represents semantic  
> relations.
> > Furthermore I postulate subnodes are semanticaly a subset of the  
> "type" of
> > the parentnode (not thinking of jcr-types but in semantical  
> classifications)
> > This leads into the following thesis: The distance to the closest  
> shared
> > parentnode represents the unidirectional distance of a node to  
> another node.
> > The result is that a whole branch has the same distance to a node.  
> (which
> > should be correct since the subnode in the branch belongs to the  
> parent node
> > which connects the branches we have to look at).
> >
> > My try to figure out a good way to produce an index for this  
> really seams to
> > be hard so I rethought my assumptions and came up with the  
> following way of
> > determining the distance without indexing the explicit distance  
> (came up
> > with this thought after reading a bit about the Analyzers and  
> Stemming).
> >
> > 1. For indexing all referenced taghandles and the own handle will  
> be taken
> > into account for indexing
> > 2. an analyzer produces stringtokens out of each handle. Each  
> handle will be
> > split up in multiple handles by removing the last node till the  
> rootnode is
> > reached (so the node and every parentnode is indexed for this node  
> as well
> > as for each referenced tag)
>
> this will only work as long as you don't move nodes. moving a node in
> jackrabbit is a light weight operation, which means only the moved
> node is re-indexed. all descendant nodes are kept untouched even
> though their path (handle) changed!
>
> regards
>  marcel
>
> > 3. The query has to built based on a given handle since I want to  
> search for
> > the semantically closest nodes.
> > 4. The query is built the same way as the Analyzer has to split  
> the handle
> > in all parent handles.
> > Result: A 100% match can only be produced for the same node (for  
> all other
> > nodes at least the own handle of the node is missing). The  
> "semantically"
> > closer a node is the more handles will match wich will result in  
> an ordering
> > as I intended. Et Voilá we have all we need to search for search
> > semantically close pages in a proper sorting order.
> >
> > I might have a gap in my conclusions but didn't realise it yet, Id  
> love to
> > have some feedback and would appreciate some help to get startet  
> with the
> > mentioned proof of concept.
> >
> > WDYT?
> >
> > Best regards,
> > Dominik
> >
> > [1] http://dev.day.com/microsling/content/blogs/main/cq5tags.html
> >
>

Mime
View raw message