Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 85134 invoked from network); 13 Feb 2002 16:29:37 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 13 Feb 2002 16:29:37 -0000 Received: (qmail 25194 invoked by uid 97); 13 Feb 2002 16:29:37 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 25166 invoked by uid 97); 13 Feb 2002 16:29:36 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 25155 invoked from network); 13 Feb 2002 16:29:35 -0000 From: "hugo burm" To: Subject: How does Lucene handle phrases containing words that are not indexed? Date: Wed, 13 Feb 2002 17:32:07 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N How does Lucene handle phrases (literals) containing words that are not indexed? (e.g. stopwords, one-letter words, numbers)? I did some tests (lucene demo, my own 120000 xml documents, Cocoon search) and in all cases it looks like that when you are looking for the phrase "a specification" it also finds documents which contain "the specification". (or: "D. Washington" instead of "G. Washington"). Of course you can change the index behaviour and make sure there are no stopwords, and all one-letter words and numbers are indexed. But that seems a bad approach. A better approach: 1) find all indexed words in the phrase and from these words find all documents containing these words. 2) check the occurence of the phrase by opening the original document. I am wondering: does Lucene performs step 2)? Off course this step burns some cpu cycles. Hugo hugob@xs4all.nl -- To unsubscribe, e-mail: For additional commands, e-mail: