lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Incze Lajos <in...@mail.matav.hu>
Subject Re: Performance implications of unanlyzed content
Date Fri, 16 Apr 2004 19:42:42 GMT
On Fri, Apr 16, 2004 at 08:59:42AM +0200, Magnus Johansson wrote:
> Hi
> 
> I'm developing an application using Lucene where I need to
> be able to both search using a stemmer and sometimes using
> "exact" search.
> 
> I see two ways of doing this:
> 
> 1. Use two indexes. One using a stemming analyzer and one using
>    a SimpleAnalyzer
> 
> 2. Using duplicate fields. One field with stemmed content and
>    one with unstemmed content. (Perhaps the field CONTENT, will be
>    CONTENT and CONTENT_RAW)
> 
> I'm leaning towards option 2. However I'm interested in any performance
> implications. If I understand it correctly Lucene keeps separate
> term-dictionaries for each field. So besides the index growing larger
> (which might affect caching) it won't be any slower searching the index
> with duplicate fields when I only query on the CONTENT field
> 
> Is this correct?
> 
> 
> Magnus

In the exact same situation I'm using your option 2. There may be some
perfomance implication, but it's well under human recognition in my case.

incze

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message