Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 98885 invoked from network); 6 Jul 2010 13:42:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Jul 2010 13:42:29 -0000 Received: (qmail 80380 invoked by uid 500); 6 Jul 2010 13:42:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80033 invoked by uid 500); 6 Jul 2010 13:42:24 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80025 invoked by uid 99); 6 Jul 2010 13:42:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jul 2010 13:42:23 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [206.190.49.14] (HELO web52904.mail.re2.yahoo.com) (206.190.49.14) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 06 Jul 2010 13:42:16 +0000 Received: (qmail 57301 invoked by uid 60001); 6 Jul 2010 13:40:55 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1278423655; bh=AoYiSJbNpeiQbUzP7UaKiBcK1uguG8k6vQpIsLqhEDI=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=ILKx7rSfLepWBsPDbBzCyAV/MVy2w60PFciMy4uZjZ3sa8zynP/AX3tA9Nczchv4Yn0FNDWRwN6KGcGxWkHmtR7JVEIOc9orvRaFyraNUDWlLedWE0lVgIle28SQAewWhqqo7OFPKlZmouBe5hfFHnx2YH99w+ES0SXuOBj+8hQ= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=FmrIww0rtMzTfhO8P2mf+wFE7WoVNbtM8NxtPXyDssynkgql3St+7P7h7wMwHbtGttEl6/nv7A2F1lVL9pfYDi1zvKU37YjY9TVKPH1SpDwDYJAA/tBbnbAFWyPXDW3w7YHDc88RpwnkjhWdLG7N6gkEPEUxQjjDSxjzKQvsSWw=; Message-ID: <219512.55952.qm@web52904.mail.re2.yahoo.com> X-YMail-OSG: ._O4n2IVM1nYeGPB2mPxhVw_ePk3VSXt_fLme2Ey4cF9VEY py2fqIOWpXOhS0egfGUFl2uOB7XsDNRgDJssx2IKiOcVP9By_hYQPw1Rqws7 eHb48FYfUmuyCkimfBFFzfOXWwL71Jiztp_QFoQaPdBeKMoWXZlcHBbphK5. J66nOa6IH8ug5p8sbZkjyvrCMbnzMg3D.wLWarmwXTF4wNXnEpXMuXoZ4x.s Le1f06w3HO4OQuh3pZKULXjbeeK0fMZ143Xz52pYPIubMd_MQ2xaiuHfvpGU 2ylM_Xivu1whVWio4RMKGtyHV4lPB8NMG6DTYbgovhHXX4kPM1LveBYvEVJ1 i9QTmPaVPmR7rd2HiImu0ZT5DzmT9M9b6X.xSIMTdI6ZNsizE Received: from [78.168.95.108] by web52904.mail.re2.yahoo.com via HTTP; Tue, 06 Jul 2010 06:40:54 PDT X-Mailer: YahooMailClassic/11.1.4 YahooMailWebService/0.8.104.274457 Date: Tue, 6 Jul 2010 06:40:54 -0700 (PDT) From: Ahmet Arslan Subject: Re: multi-term synonym expansion To: java-user@lucene.apache.org In-Reply-To: <14B9ED6F-AEFC-400A-9774-33F6F4D1B0DF@univie.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > My custom SKOSAnalyzer already performs synonym expansion=0A> based on th= e labels defined in a given SKOS model. But now I=0A> have the problem that= real-world thesauri often define=0A> (multi terms) synonyms for mult-term = words. Here is an=0A> example that defines the abbreviation "UN" as synonym= for=0A> "United Nations"=0A> =0A> =0A> =A0 =A0 =A0 United= =0A> Nations=0A> =A0 =A0 =A0=0A> UN=0A> =0A> =0A> At the end the analyzer should add t= he term UN at the right=0A> position in the index. Taking the example above= , a sentence=0A> "I work for the United Nations" should appear in the index= =0A> as =0A> =0A> 2: [work: 2-> 6]=0A> 5: [united nations: 15->29] [un: 15-= >29]=0A> =0A> ...so that a query "I work for the UN" also matches the=0A> d= ocument.=0A> =0A> What is the best solution to implement that. With a=0A> T= okenFilter I can work through the sentence token by token=0A> (using increm= entToken()) and check if there is a synonym=0A> available. How can I analyz= e token sequences in a given=0A> text? Do I need to implement a custom toke= nizer that=0A> recognizes entities based on a given dictionary?=0A> =0A> I = am grateful for any suggestions or advice.=0A=0Ahttp://wiki.apache.org/solr= /AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory can handle multi= -word synonyms. This may help.=0A=0A=0A --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org