Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 94AF16FE3 for ; Mon, 20 Jun 2011 11:19:51 +0000 (UTC) Received: (qmail 89584 invoked by uid 500); 20 Jun 2011 11:19:50 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 89543 invoked by uid 500); 20 Jun 2011 11:19:50 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 89536 invoked by uid 99); 20 Jun 2011 11:19:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jun 2011 11:19:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lukas.vlcek@gmail.com designates 209.85.213.176 as permitted sender) Received: from [209.85.213.176] (HELO mail-yx0-f176.google.com) (209.85.213.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jun 2011 11:19:43 +0000 Received: by yxd5 with SMTP id 5so3192910yxd.35 for ; Mon, 20 Jun 2011 04:19:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=7Y10YnlaTs2OGY+SM2WOgpzesPYGv8jCG4qSoqKh7C4=; b=T5ZqBG8mOsDEeA54WNhDmPLBwJmT0om4t6oA2qCon9WubrVbsN109hfOMA9yomdRej aoeiSZpA+L0YR9BZeuAHnCanDgFMnGguxItfAygJnBQCe61vmIeBIPtWjuGWreVTBndJ EFq0CcDg3or+akbYQB0tH79HShGZooIgtJ2mU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=UiFypNKCqftNJrfK+IUy573yO6ojPGgQkIzcNISpIpQJBE8YT9/pJDgqdAJ8G0zaT1 4VjwrjBR8UKY1TePjc1F/indJfrw0Ha3PSgCEsAOIyHKWtc+bdEFlohs2vjsVIKtrzTH nYMXa//uLebL0AcLPF1PTdnZ3AWFfqqDzt2A8= Received: by 10.236.154.1 with SMTP id g1mr7916042yhk.112.1308568762094; Mon, 20 Jun 2011 04:19:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.147.40.7 with HTTP; Mon, 20 Jun 2011 04:19:02 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?B?THVrw6HFoSBWbMSNZWs=?= Date: Mon, 20 Jun 2011 13:19:02 +0200 Message-ID: Subject: Re: KStem custom lexicons configuration possible? To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=20cf302d49c63a960104a622e810 X-Virus-Checked: Checked by ClamAV on apache.org --20cf302d49c63a960104a622e810 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable May be I should show some examples where I think custom configuration can b= e useful. Let me give you two examples: 1) As of now, KStem does conflation of both words "connector" and "connected" to the same term "connect". 2) Contrary it does not do conflation of "transaction" and "transactions" t= o the same term. Having an option to modify internal lexicons I would be able to adapt the KStem to work better for specific text corpora. What do you think? Regards, Lukas On Mon, Jun 20, 2011 at 12:55 PM, Luk=C3=A1=C5=A1 Vl=C4=8Dek wrote: > Hi, > > Is there any API in KStem filter for lexicons configuration? > > As far as I understand the original code works in such a way that lexicon= s > are loaded from files at startup (see > http://lexicalresearch.com/kstem-doc.txt). The author (Robert Krovetz) > names possibility to modify lexicons among advantages of KStem compared t= o > other stemmers. > > Do people not need it? Would it be a useful addition for KStem filter to > allow custom lexicon configurations in its API? > > Regards, > Lukas > > Note: Big kudos to all who participated in bringing KStem into Lucene! > --20cf302d49c63a960104a622e810 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable May be I should show some examples where I think custom configuration can b= e useful. Let me give you two examples:

1) As of now, KS= tem does conflation of both words "connector" and "connected= " to the same term "connect".
2) Contrary it does not do conflation of "transaction" and &= quot;transactions" to the same term.

Having a= n option to modify internal lexicons I would be able to adapt the KStem to = work better for specific text corpora.

What do you think?

Regards,
Lukas

On Mon, Jun 20, 2011 a= t 12:55 PM, Luk=C3=A1=C5=A1 Vl=C4=8Dek <lukas.vlcek@gmail.com> wrote:
Hi,

Is there any API in = KStem filter for lexicons configuration?

As far as= I understand the original code works in such a way that lexicons are loade= d from files at startup (see=C2=A0http://lexicalresearch.com/kstem-doc.txt)= . The author (Robert Krovetz) names possibility to modify lexicons among ad= vantages of KStem compared to other stemmers.

Do people not need it? Would it be a useful addition fo= r KStem filter to allow custom lexicon configurations in its API?

Regards,
Lukas

Note: Big= kudos to all who participated in bringing KStem into Lucene!

--20cf302d49c63a960104a622e810--