Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 63811 invoked from network); 9 Jan 2010 12:19:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Jan 2010 12:19:59 -0000 Received: (qmail 27176 invoked by uid 500); 9 Jan 2010 12:19:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 27077 invoked by uid 500); 9 Jan 2010 12:19:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 27067 invoked by uid 99); 9 Jan 2010 12:19:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jan 2010 12:19:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shashi.mit@gmail.com designates 209.85.219.224 as permitted sender) Received: from [209.85.219.224] (HELO mail-ew0-f224.google.com) (209.85.219.224) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Jan 2010 12:19:48 +0000 Received: by ewy24 with SMTP id 24so26796471ewy.6 for ; Sat, 09 Jan 2010 04:19:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:from:date:x-google-sender-auth:message-id:subject:to :content-type:content-transfer-encoding; bh=qnQ5PA0ye9QZWHH9ecoWJb4lMzEIRkKViP0Vf83Vwfs=; b=lqRUV1aTsbz6+Y8/q7tM1iXL6QGMkx7LJUNhpJ4lcF585mCzJQuPKnXlKe98IuK3EI UFnLG5dae6LDKSbkpda5QqRLcBxwtOTVYJWneZYvwsS3EcadVgr72M6e+YEwHwXV4ILK wAbhYIisQKk6aaCgdiohUU7agCAZSZWi/eD4I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type :content-transfer-encoding; b=Im9IwprY3aWQJkADVOtj5x3ABV4aDntHvLRoRDBjB+a+dbgEDHKbv0HDneZyd6y22B DLCoGY+M/KmhxD2n3llJaN9sA5FuF9CmZIyA7yCg/m9F1AsfJUQvrCu6YQzr5Wtyc9j+ hM4lFr2sguHrr3hqd5UgOWU09Sbpd2Pzje5+o= MIME-Version: 1.0 Sender: shashi.mit@gmail.com Received: by 10.216.87.69 with SMTP id x47mr1425123wee.97.1263039567126; Sat, 09 Jan 2010 04:19:27 -0800 (PST) In-Reply-To: <4B4845C2.3060600@stimulussoft.com> References: <678109.86020.qm@web52905.mail.re2.yahoo.com> <4B478F1F.6050406@stimulussoft.com> <8c4e68611001081211if0620cao4a3cef2bdb81c694@mail.gmail.com> <4B47A35E.60004@stimulussoft.com> <27b05b491001081339l47d67ce4j7f3d6c7e663fcd70@mail.gmail.com> <4B4845C2.3060600@stimulussoft.com> From: Shashi Kant Date: Sat, 9 Jan 2010 07:19:07 -0500 X-Google-Sender-Auth: b935c8a0957243c5 Message-ID: <4d19a3631001090419m54f58037u97c480dbd9604ec2@mail.gmail.com> Subject: Re: Search query problem To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Couldn't you just mod the PorterStemmer class for your requirements? (we did and provided it a list of ignore words & phrases specific to our needs) On Sat, Jan 9, 2010 at 4:00 AM, Jamie wrote: > Hi All > > Is there another stemmer we can use that is perhaps not as aggressive as = the > Porter Stemmer. i.e. the stemming could remove ing's, er's, but not > something so significant as to convert ""Lowe's" to "Low" > > Thanks > > Jamie > > Will Murnane wrote: >> >> On Fri, Jan 8, 2010 at 16:27, Jamie wrote: >> >>> >>> Hi Ian / Will >>> >>> Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e. = it >>> could check the capitalization of the first letter of a word and whethe= r >>> or >>> not the word is the start of sentence. If so, it could choose not apply >>> any >>> stemming. Or am I completely out of whack? >>> >> >> Look again: you're downcasing the terms before the Porter filter ever >> sees them (which is, AIUI, necessary). =A0You might do well to combine >> the tokenizing and downcasing step with some heuristic to find proper >> nouns and not downcase or stem them. >> >> Will >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > -- > Stimulus Software - MailArchiva > Email Archiving And Compliance > USA Tel: +1-713-343-8824 ext 100 > UK Tel: +44-20-80991035 ext 100 > Email: =A0jamie@stimulussoft.com > Web: http://www.mailarchiva.com > To receive MailArchiva Enterprise Edition product announcements, send a > message to: > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org