Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 7483 invoked from network); 20 Jun 2008 13:02:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Jun 2008 13:02:13 -0000 Received: (qmail 56409 invoked by uid 500); 20 Jun 2008 13:02:13 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 56385 invoked by uid 500); 20 Jun 2008 13:02:13 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 56374 invoked by uid 99); 20 Jun 2008 13:02:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jun 2008 06:02:13 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of marcel.reutegger@gmx.net designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 20 Jun 2008 13:01:23 +0000 Received: (qmail invoked by alias); 20 Jun 2008 13:01:40 -0000 Received: from l2tp.day.com (EHLO [192.168.10.33]) [62.192.10.243] by mail.gmx.net (mp003) with SMTP; 20 Jun 2008 15:01:40 +0200 X-Authenticated: #894343 X-Provags-ID: V01U2FsdGVkX181ALtRo641PChZTNiY4n6LQ0p7PTk1SIe24KPK47 1gNdPKNFwxeN01 Message-ID: <485BAA32.6080300@gmx.net> Date: Fri, 20 Jun 2008 15:01:38 +0200 From: Marcel Reutegger User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: dev@jackrabbit.apache.org Subject: Re: SimilaritySearch help needed References: <837817.26046.qm@web50502.mail.re2.yahoo.com> In-Reply-To: <837817.26046.qm@web50502.mail.re2.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org Hi, there are certain conditions that must be met for the similarity search to work properly. I've updated the wiki page. please have a look at: http://wiki.apache.org/jackrabbit/SimilaritySearch regards marcel jakobitsch juergen wrote: > Hy there, > > i can't get SimilaritySearch to work : > > in my environment i'm extracting keyphrases [ kea ] from given texts. i'm > saving the texts along with the keyphrases and finally want to be able > to select similar texts based on the extracted keyphrases. i tried a couple > of xpath-queries [ xpath - newbie , i should admit ], none of the worked as i wanted it to. > > at the moment i'm using the follwing structure for my texts [ i'm willing to change if necessary ] > > rootNode > +2sea:BlogSphere > +2sea:BlogEntry [ @2sea:BlogEntryTitle, @2sea:BlogEntryContent ] > +2sea:Key [ @2sea:Phrase ] > > for example : > > 2sea:BlogEntry 1 : > @2sea:BlogEntryTitle = "aloha 21" > @2sea:BlogEntryContent = "java is a wonderfull programming language" > +2sea:Key > @2sea:Phrase = "java" > +2sea:Key > @2sea:Phrase = "programming language" > > > > 2sea:BlogEntry 2 : > @2sea:BlogEntryTitle = "welcome 23" > @2sea:BlogEntryContent = "programming languages like java or perl..." > +2sea:Key > @2sea:Phrase = "java" > +2sea:Key > @2sea:Phrase = "programming language" > +2sea:Key > @2sea:Phrase = "perl" > > so these should similar based on the key->Phrase > > is this possible anyway or do i have to have a certain structure to be able to get similar nodes. > it might help to just have a property 2sea:KeyPhrases with a String[] as Value > > any help appreciated > wkr j > www.2sea.org > > > >