jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominik Süß <dominik.su...@gmail.com>
Subject Re: Semantic distance search
Date Sat, 06 Jun 2009 16:26:00 GMT
Thanks for this hint.
So I propably should think about handling this search without lucene and use
jcr-api for it or combine it (for the tagging information which should be
correct and should be used since there is no real navigation since the
coupling for this way of tagging is based on lose references).
I did hope I could use the lucene index for all of it to have a better
performance but I think it's worth trying it the other way - now I have to
figure out how to combine "my" score with the lucene score.

Best regards,
Dominik

On Fri, Jun 5, 2009 at 9:24 PM, Alexander Klimetschek <aklimets@day.com>wrote:

> No, the search will work, because the path information is not stored in the
> lucene index - hence no reindex is needed upon a move - and path location
> steps are handled without the lucene index.
>
> Regards,
> Alex
>
> --Alexander Klimetschek @iPhone
>
>
> Am 05.06.2009 um 11:58 schrieb Dominik Süß <dominik.suess@gmail.com>:
>
> Hi Marcel,
>
> doesn't that mean I never can be sure I'll get a proper result when
> searching for the path of a node?
>
> Best regards,
> Dominik
>
> On Tue, Jun 2, 2009 at 1:43 PM, Marcel Reutegger <<marcel.reutegger@gmx.net>
> marcel.reutegger@gmx.net> wrote:
>
>> Hi,
>>
>> 2009/5/21 Dominik Süß < <dominik.suess@gmail.com>dominik.suess@gmail.com
>> >:
>> > Hi everybody,
>> >
>> > after having some time of indirect contact with JCR throught sling and
>> day
>> > crx/cq I now think it's time to get in touch with jackrabbit directly.
>> As
>> > the subject says I do this after having an idea which I'd like to share
>> and
>> > need some help to realize (since my lucene experiences are close to
>> nothing
>> > but pure usage & theory). I did try to start with a proof of concept but
>> as
>> > I looked in the current implementations of search in jcr I had to
>> realize I
>> > need someone who could give me a jumpstart and does the first steps
>> together
>> > with me. So here I go with my idea:
>> >
>> > I recently had some thoughts about something I'd call sementic distance
>> in
>> > multidimensional hierachies (content structures + hierarchical tagging
>> like
>> > in CQ 5 [1]).
>> >
>> > The task I would like to fullfill: Find the semantically closest nodes
>> for a
>> > given node.
>> >
>> > I postulate that structure represents the semantic relation as well as
>> the
>> > referenced tags are in a hierarchie that represents semantic relations.
>> > Furthermore I postulate subnodes are semanticaly a subset of the "type"
>> of
>> > the parentnode (not thinking of jcr-types but in semantical
>> classifications)
>> > This leads into the following thesis: The distance to the closest shared
>> > parentnode represents the unidirectional distance of a node to another
>> node.
>> > The result is that a whole branch has the same distance to a node.
>> (which
>> > should be correct since the subnode in the branch belongs to the parent
>> node
>> > which connects the branches we have to look at).
>> >
>> > My try to figure out a good way to produce an index for this really
>> seams to
>> > be hard so I rethought my assumptions and came up with the following way
>> of
>> > determining the distance without indexing the explicit distance (came up
>> > with this thought after reading a bit about the Analyzers and Stemming).
>> >
>> > 1. For indexing all referenced taghandles and the own handle will be
>> taken
>> > into account for indexing
>> > 2. an analyzer produces stringtokens out of each handle. Each handle
>> will be
>> > split up in multiple handles by removing the last node till the rootnode
>> is
>> > reached (so the node and every parentnode is indexed for this node as
>> well
>> > as for each referenced tag)
>>
>> this will only work as long as you don't move nodes. moving a node in
>> jackrabbit is a light weight operation, which means only the moved
>> node is re-indexed. all descendant nodes are kept untouched even
>> though their path (handle) changed!
>>
>> regards
>>  marcel
>>
>> > 3. The query has to built based on a given handle since I want to search
>> for
>> > the semantically closest nodes.
>> > 4. The query is built the same way as the Analyzer has to split the
>> handle
>> > in all parent handles.
>> > Result: A 100% match can only be produced for the same node (for all
>> other
>> > nodes at least the own handle of the node is missing). The
>> "semantically"
>> > closer a node is the more handles will match wich will result in an
>> ordering
>> > as I intended. Et Voilá we have all we need to search for search
>> > semantically close pages in a proper sorting order.
>> >
>> > I might have a gap in my conclusions but didn't realise it yet, Id love
>> to
>> > have some feedback and would appreciate some help to get startet with
>> the
>> > mentioned proof of concept.
>> >
>> > WDYT?
>> >
>> > Best regards,
>> > Dominik
>> >
>> > [1] <http://dev.day.com/microsling/content/blogs/main/cq5tags.html>
>> http://dev.day.com/microsling/content/blogs/main/cq5tags.html
>> >
>>
>
>

Mime
View raw message