lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lex Lawrence" <>
Subject Re: Normalization
Date Tue, 12 Mar 2002 16:16:16 GMT
My unsolicited two  cents:

I like Brian's idea.  Still, I'm curious if it would be possible (and 
prudent) to allow a little more flexibility.  In some cases it might be 
useful to use different but compatible Analyzers for indexing and searching.

An example would be to index all words, and then perform searches removing 
stopwords from the queries.  If I understand the process correctly this 
would achieve several things:  First it would decrease (not eliminate, I 
admit) the influence of stopwords in scoring, resulting in more relevant 
results.  Second it would preserve information about the proximity of words 
and depending on what you're interested in, make queries using slop factor 
more meaningful.  Finally, if you wanted, you could let a user choose at 
search time whether or not to remove the stopwords from queries, using 
different Analyzers but the same index.  The merit of this particular 
example may be debatable, but less relevant to the current discussion.  The 
point is that it might be desirable to use different Analyzers for indexing 
and searching.

So... might there be a compromise?  Is there a way of indicating the type of 
Analyzer used to create an index and requiring that a compatible Analyzer be 
used for searches without requiring the exact same Analyzer?  I had thought 
that maybe compatible Analyzers could implement the same empty interface, 
but that would be difficult to do with Analyzers created from rules, 
wouldn't it?  I'm curious to hear what you folks think.


>From: Brian Goetz <>
>Reply-To: "Lucene Developers List" <>
>To: Lucene Developers List <>
>Subject: Re: Normalization
>Date: Mon, 11 Mar 2002 14:56:18 -0800
> > Isn't this really a property of an index rather then an entire Lucene
> > build?
>Technically no, but in spirit, yes.
>Personally, I always liked the idea of creating an Analyzer at index
>creation time, and having the Analyzer object stored as a serialized
>object in the index.  Then you couldn't make the all-too-common
>mistake of indexing with one and then trying to search with another.
> > If so, having a text-based way to describe a policy is very helpful
> > and better than a source code-based one.
>To unsubscribe, e-mail:   
>For additional commands, e-mail: 

MSN Photos is the easiest way to share and print your photos:

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message