Return-Path: Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 5880 invoked from network); 26 Aug 2003 18:58:18 -0000 Received: from unknown (HELO c007.snv.cp.net) (209.228.33.236) by daedalus.apache.org with SMTP; 26 Aug 2003 18:58:18 -0000 Received: (cpmta 5687 invoked from network); 26 Aug 2003 11:51:39 -0700 Received: from 128.252.140.120 (HELO helix.stanford.edu) by smtp.alumni.princeton.edu (209.228.33.236) with SMTP; 26 Aug 2003 11:51:39 -0700 X-Sent: 26 Aug 2003 18:51:39 GMT Message-ID: <3F4BAC32.1030406@helix.stanford.edu> Date: Tue, 26 Aug 2003 11:51:30 -0700 From: Mark Woon User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Newbie Questions References: <000001c36bd0$60af59e0$49098c92@pcara> In-Reply-To: <000001c36bd0$60af59e0$49098c92@pcara> Content-Type: multipart/alternative; boundary="------------020306030804020308040601" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --------------020306030804020308040601 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Gregor Heinrich wrote: > ad 1: MultiFieldQueryParser is what you might want: you can specify the > fields to run the query on. Alternatively, the practice of duplicating > the > contents of all separate fields in question into one additional merged > field > has been suggested, which enables you to use QueryParser itself. > Ah, I've been testing out something similar to the latter. I've been adding multiple values on the same key. Won't this have the same effect? I've been assuming that if I do doc.add(Field.Keyword("content", "value1"); doc.add(Field.Keyword("content", "value2"); And did a search on the "content" field for either value, I'd get a hit, and it seems to work. This way, I figure I'd be able to differentiate between values that I want tokenized and values that I don't. Is there a difference between this and building a StringBuffer containing all the values and storing that as a single field-value? > ad 2: Depending on the Analyzer you use, the query is normalised, i.e., > stemmed (remove suffices from words) and stopword-filtered (remove highly > frequent words). Have a look at StandardAnalyzer.tokenStream(...) to > see how > the different filters work. In the analysis package the 1.3rc2 Lucene > distribution has a Porter stemming algorithm: PorterStemmer. > There's an rc2 out? Where?? I just checked the Lucene website and only see rc1. Thanks everyone for all the quick responses! -Mark --------------020306030804020308040601--