Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 6325 invoked from network); 14 Feb 2003 21:54:54 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 14 Feb 2003 21:54:54 -0000 Received: (qmail 9391 invoked by uid 97); 14 Feb 2003 21:56:32 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 9384 invoked from network); 14 Feb 2003 21:56:31 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 14 Feb 2003 21:56:31 -0000 Received: (qmail 6067 invoked by uid 500); 14 Feb 2003 21:54:51 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 6054 invoked from network); 14 Feb 2003 21:54:51 -0000 Received: from mail5.atl.registeredsite.com (64.224.219.79) by daedalus.apache.org with SMTP; 14 Feb 2003 21:54:51 -0000 Received: from mail.imorph.com (mail.imorph.com [216.247.96.155]) by mail5.atl.registeredsite.com (8.12.2/8.12.6) with ESMTP id h1ELst7N004125 for ; Fri, 14 Feb 2003 16:54:55 -0500 Received: from nethi [216.247.96.155] by mail.imorph.com (SMTPD32-6.06) id A5ABD0EF0090; Fri, 14 Feb 2003 16:54:51 -0500 Message-ID: <00dc01c2d473$b46e99f0$1400a8c0@nethi> From: "Mailing Lists Account" To: "Lucene Users List" References: <187D6D956106D84E9D8B280F6458FE140F5B2A@merc12.na.sas.com> Subject: Re: Phrase query and porter stemmer Date: Sat, 15 Feb 2003 03:24:47 +0530 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Interesting. Thanks to the lucene and this list, I am learning lot more about how search engines work. regards Ramesh ----- Original Message ----- From: "Eric Isakson" To: "Lucene Users List" Sent: Thursday, February 13, 2003 9:10 PM Subject: RE: Phrase query and porter stemmer Ramesh, I haven't examined the code closely that does this positioning, but this is how I believe it works: Let say you had a token stream that returned the tokens "you", "are", "running", "faster", "than", "me" that didn't do any setPositionIncrement calls. The default increment is 1. Each token in the stream gets a position that allows you do things like proximity searches the query "are than"~3 would find the document that token stream came from since "are" occurrs at position 2 and "than" at position 5 and 5-2 <= 3. Now lets say you wanted to stem "running" to "run" but keep the original token. You would create a token filter that inserted the stem "run" into the token stream when the "running" token occurred but also kept the original token "running". If you didn't set the position increment on the second token then the distance between "are" and "than" would become 6-2 = 4 which is greater than 3 and your proximity query would fail. When you set the position increment to zero for the added token it gets treated like it is at the same position as the original token which prevents you from breaking your proximity query. Proximity queries are the place I know this affects. I'm unsure how the positions affect other parts Lucene. Hope I got all that right and that it helps you understand the setPositionIncrement. Eric -----Original Message----- From: Mailing Lists Account [mailto:mlists@imorph.com] Sent: Thursday, February 13, 2003 7:07 AM To: Lucene Users List Subject: Re: Phrase query and porter stemmer Hi Eric, Thanks for the reply. The option of custom token filter sounds good to me. I am not sure what is the advantage of Token.setPositionIncrement() option. Let me look into the docs before I ask further questions on this. regards Ramesh Eric Isakson wrote: > You won't get hits for "security" if you do not use the stemmer. The > stem of "security" is the token that gets stored in the index. > > If you don't use the stemming algorithm when you create the index you > could search for "security" and only get those documents that contain > "security". > > See the FAQ > http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexi > ng&toc=faq#q15 > > If you have a list of terms you want to treat differently (i.e. you > know there are certain words you don't want to stem) you could build > a custom TokenFilter that checks the tokens for those words before > applying the stemming algorithm then add that TokenFilter to your > analyzer. You might also consider allowing the tokens to be stemmed > and adding the original non-stemmed term at the same position using > Token.setPositionIncrement(0), you might also want to figure out some > way to boost the score on those non-stemmed tokens when you build > your query (not sure how you might accomplish that, but some custom > query parsing code could do the trick). > > Eric > > -----Original Message----- > From: Mailing Lists Account [mailto:mlists@imorph.com] > Sent: Wednesday, February 12, 2003 4:17 AM > To: lucene-user@jakarta.apache.org > Subject: Phrase query and porter stemmer > > > Hi, > > I use PorterStemmer with my analyzer for indexing the documents. > And I have been using the same analyzer for searching too. > > When I search for a phrase like "security" AND database, I would like > to avoid matches for > terms like "secure" or "securities" . I observed that Google and > couple of search engines do > not return such matches. > > 1) In otherwords, in a single query, is it possible not to choose > porter stemmer for phrase queries and > use for other queries (such as Term query etc) > > 2) As an alternative, is it advisable to manually construct a > PhraseQuery by adding terms without appling porter > stemmer ? > > regards > Ramesh > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org