Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Message-ID: <3F4BAC32.1030406@helix.stanford.edu>
Date: Tue, 26 Aug 2003 11:51:30 -0700
From: Mark Woon <morpheus@helix.stanford.edu>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
 rv:1.4) Gecko/20030624
MIME-Version: 1.0
To: Lucene Users List <lucene-user@jakarta.apache.org>
Subject: Re: Newbie Questions
References: <000001c36bd0$60af59e0$49098c92@pcara>
In-Reply-To: <000001c36bd0$60af59e0$49098c92@pcara>
Content-Type: multipart/alternative;
 boundary="------------020306030804020308040601"

--------------020306030804020308040601
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Gregor Heinrich wrote:

> ad 1: MultiFieldQueryParser is what you might want: you can specify the
> fields to run the query on. Alternatively, the practice of duplicating 
> the
> contents of all separate fields in question into one additional merged 
> field
> has been suggested, which enables you to use QueryParser itself.
>

Ah, I've been testing out something similar to the latter.  I've been 
adding multiple values on the same key.  Won't this have the same 
effect?  I've been assuming that if I do

doc.add(Field.Keyword("content", "value1");
doc.add(Field.Keyword("content", "value2");

And did a search on the "content" field for either value, I'd get a hit, 
and it seems to work.  This way, I figure I'd be able to differentiate 
between values that I want tokenized and values that I don't.

Is there a difference between this and building a StringBuffer 
containing all the values and storing that as a single field-value?


> ad 2: Depending on the Analyzer you use, the query is normalised, i.e.,
> stemmed (remove suffices from words) and stopword-filtered (remove highly
> frequent words). Have a look at StandardAnalyzer.tokenStream(...) to 
> see how
> the different filters work. In the analysis package the 1.3rc2 Lucene
> distribution has a Porter stemming algorithm: PorterStemmer.
>

There's an rc2 out?  Where??  I just checked the Lucene website and only 
see rc1.


Thanks everyone for all the quick responses!

-Mark


--------------020306030804020308040601--