Return-Path: Delivered-To: apmail-incubator-lucy-dev-archive@www.apache.org Received: (qmail 12512 invoked from network); 9 Nov 2010 09:51:52 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Nov 2010 09:51:52 -0000 Received: (qmail 50768 invoked by uid 500); 9 Nov 2010 09:52:23 -0000 Delivered-To: apmail-incubator-lucy-dev-archive@incubator.apache.org Received: (qmail 50703 invoked by uid 500); 9 Nov 2010 09:52:21 -0000 Mailing-List: contact lucy-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: lucy-dev@incubator.apache.org Delivered-To: mailing list lucy-dev@incubator.apache.org Received: (qmail 50695 invoked by uid 99); 9 Nov 2010 09:52:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 09:52:20 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rcmuir@gmail.com designates 209.85.214.47 as permitted sender) Received: from [209.85.214.47] (HELO mail-bw0-f47.google.com) (209.85.214.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Nov 2010 09:52:13 +0000 Received: by bwz10 with SMTP id 10so6354964bwz.6 for ; Tue, 09 Nov 2010 01:51:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=OUAm51EQxKgnUohgAH2JYuTJXY6kUwimQbzlFyzjICU=; b=xe2B5FRkVb38uHnmJwelFqVEZW7rt152AJmizwLlVWrkVhPInc/+0jDwEFJX3L/2fw 3AtDnKG2oUVBF4ZHgmSrGokhAnHc+myuB5SDHpMXemVvucfXGZnAOWIK8BQk11VRYoh/ rahEnE2FuW/wbO6vNmxs46bKLkwN4a7O12Ago= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=t7coqpkFeX7xhlGCqUlAasJz678WFpfy4BZ1e6S9P+vzIF+90v4B8eTNOHxL4GU9IB 26C9KpAYRI1zVRIh0Ve0kaIb9y1ybPopvbCSPZH3xUvGO1w3AfmiVa3+GXfzHX7Iwj04 uQdSsBuBPaPYgGIyyo10gqAfDhm2F4Hb0O4AE= Received: by 10.204.113.74 with SMTP id z10mr5993477bkp.25.1289296313390; Tue, 09 Nov 2010 01:51:53 -0800 (PST) MIME-Version: 1.0 Received: by 10.204.77.201 with HTTP; Tue, 9 Nov 2010 01:51:33 -0800 (PST) In-Reply-To: <20101109030159.GA1688@rectangular.com> References: <20101030005221.GB21848@rectangular.com> <4CCB7AE5.50701@peknet.com> <20101109030159.GA1688@rectangular.com> From: Robert Muir Date: Tue, 9 Nov 2010 04:51:33 -0500 Message-ID: To: lucy-dev@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [lucy-dev] Bundling Snowball On Mon, Nov 8, 2010 at 10:01 PM, Marvin Humphrey w= rote: > The "semiclean" build target has been added. =C2=A0I opened > for bundling the Snowbal= l > stemming library. =C2=A0A separate issue will follow for bundling the sto= plists. > Some quick notes, from lucene-java: * are you going to do svn checkouts for bundling snowball? I don't think they are really releasing anymore, but there are in fact new languages, etc in svn. * every so often snowball makes changes to the rules for the languages.. this can be tricky depending on how you handle backwards compatibility. In lucene java we have a checkout of revision 502, but then with the newer languages added (Armenian, Catalan, Basque)... if we fully 'svn updated' to the latest rev it would change things about german stemming from our previous release, for example, and be a hassle for people who created indexes with those older versions. * when bundling the stoplists: there are some languages, even "released" ones (Turkish, Romanian, etc) that don't have snowball-included stoplists. if you want, you could use the ones we have in lucene to provide stoplists for these languages... http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/sr= c/resources/org/apache/lucene/analysis/tr/stopwords.txt http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/sr= c/resources/org/apache/lucene/analysis/ro/stopwords.txt http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/sr= c/resources/org/apache/lucene/analysis/hy/stopwords.txt http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/sr= c/resources/org/apache/lucene/analysis/eu/stopwords.txt http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/sr= c/resources/org/apache/lucene/analysis/ca/stopwords.txt these are of variable quality: the ones with source information in the header means that I found one clearly marked with BSD or Apache. If they have no header, it means i made them myself... it might seem absurd to worry about "licensing" for stopwords, but you never know :)