Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 29905 invoked from network); 18 Nov 2004 01:55:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 18 Nov 2004 01:55:18 -0000 Received: (qmail 8255 invoked by uid 500); 18 Nov 2004 01:55:14 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 8233 invoked by uid 500); 18 Nov 2004 01:55:13 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 8220 invoked by uid 99); 18 Nov 2004 01:55:13 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of clamprecht@gmail.com designates 64.233.170.192 as permitted sender) Received: from [64.233.170.192] (HELO rproxy.gmail.com) (64.233.170.192) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 17 Nov 2004 17:55:08 -0800 Received: by rproxy.gmail.com with SMTP id b11so995192rne for ; Wed, 17 Nov 2004 17:54:44 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=XZfDFaG6ROgUTMH7gCN6m79e3+4fqJe8fDJCaKvcX//YKYM5fNedfrqsaypb1RiWz5lTFLC1Lf6/w0gDpOjJKP3m7xRkfQoSLAxaCoYeTJRCA2mQ2gK2HcqN9s1bxGhIk0OGicg5C2rwHUabzB+7hGM5qb4uIwedCsz4EOeKiIQ= Received: by 10.38.162.54 with SMTP id k54mr4258rne; Wed, 17 Nov 2004 16:08:03 -0800 (PST) Received: by 10.38.72.66 with HTTP; Wed, 17 Nov 2004 16:08:03 -0800 (PST) Message-ID: <88c6a67204111716084e343252@mail.gmail.com> Date: Wed, 17 Nov 2004 18:08:03 -0600 From: Chris Lamprecht Reply-To: Chris Lamprecht To: Lucene Users List Subject: Re: Considering intermediary solution before Lucene question In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N John, It actually should be pretty easy to use just the parts of Lucene you want (the analyzers, etc) without using the rest. See the example of the PorterStemmer from this article: http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html?page=2 You could feed a Reader to the tokenStream() method of PorterStemAnalyzer, and get back a TokenStream, from which you pull the tokens using the next() method. On Wed, 17 Nov 2004 18:54:07 -0500, jeichels@optonline.net wrote: > > Is there a way to use Lucene stemming and stop word removal without using the rest of the tool? I am downloading the code now, but I imagine the answer might be deeply burried. I would like to be able to send in a phrase and get back a collection of keywords if possible. > > I am thinking of using an intermediary solution before moving fully to Lucene. I don't have time to spend a month making a carefully tested, administratable Lucene solution for my site yet, but I intend to do so over time. Funny thing is the Lucene code likely would only take up a couple hundred of lines, but integration and administration would take me much more time. > > In the meantime, I am thinking I could use perhaps Lucene steming and parsing of words, then stick each search word along with the associated primary key in an indexed MySql table. Each record I would need to do this to is small with maybe only average 15 userful words. I would be able to have an in-database solution though ranking, etc would not exist. This is better then the exact word searching i have currently which is really bad. > > By the way, MySql 4.1.1 has some Lucene type handling, but it too does not have stemming and I am sure it is very slow compaired to Lucene. Cpanel is still stuck on MySql 4.0.* so many people would not have access to even this basic ability in production systems for some time yet. > > JohnE > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org