Return-Path: X-Original-To: apmail-incubator-opennlp-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-opennlp-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13AB28914 for ; Thu, 18 Aug 2011 10:33:10 +0000 (UTC) Received: (qmail 61028 invoked by uid 500); 18 Aug 2011 10:33:08 -0000 Delivered-To: apmail-incubator-opennlp-dev-archive@incubator.apache.org Received: (qmail 59374 invoked by uid 500); 18 Aug 2011 10:32:58 -0000 Mailing-List: contact opennlp-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: opennlp-dev@incubator.apache.org Delivered-To: mailing list opennlp-dev@incubator.apache.org Received: (qmail 59359 invoked by uid 99); 18 Aug 2011 10:32:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2011 10:32:53 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kottmann@gmail.com designates 209.85.214.47 as permitted sender) Received: from [209.85.214.47] (HELO mail-bw0-f47.google.com) (209.85.214.47) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Aug 2011 10:32:44 +0000 Received: by bkbzu17 with SMTP id zu17so1535256bkb.6 for ; Thu, 18 Aug 2011 03:32:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=cb3FFBEmYfDxHd/kR4PTsjinnecyqtbat2Ap8q1Xb6Y=; b=V6t5fpWpimklH0GsM+2xSfWUBIrKJL0gGlm7GFQ3MBEtMan8N+fz3j8h6GYU93PF6d qpl4tLymzHDFCNIj+LfPGcTWkikysi3tQS4rWVUeNCpkBD47xw+fyzcrLLHZvwZ3D2Qe lnhBs6UpNYVScJb2aLP/qqS2j+02ihvIGCeN8= Received: by 10.205.32.194 with SMTP id sl2mr284634bkb.138.1313663544151; Thu, 18 Aug 2011 03:32:24 -0700 (PDT) Received: from karkand.infopaq.net (dkcphfw01.infopaq.dk [213.150.59.2]) by mx.google.com with ESMTPS id x19sm679039bkt.42.2011.08.18.03.32.22 (version=SSLv3 cipher=OTHER); Thu, 18 Aug 2011 03:32:23 -0700 (PDT) Message-ID: <4E4CEA35.1040103@gmail.com> Date: Thu, 18 Aug 2011 12:32:21 +0200 From: =?UTF-8?B?SsO2cm4gS290dG1hbm4=?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:6.0) Gecko/20110812 Thunderbird/6.0 MIME-Version: 1.0 To: opennlp-dev@incubator.apache.org Subject: Re: Stemmer References: <4E4CE74A.7020000@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org On 8/18/11 12:24 PM, Olivier Grisel wrote: > Is this better or cover more languages than what's already provided by > Apache Lucene? Maybe it should better be contributed to the Lucene > project and make it easy to use the generic, battle tested Lucene > analyzers / tokenizers infrastructure to generate features in OpenNLP. The OpenNLP APIs are all not designed to work on token streams, instead a user usually has to provide an entire sentence at once, so that does not make a nice fit. And since we are an NLP library I believe it is absolutly fine to implement our own stemming here. Jörn