lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Magnus Johansson" <mag...@technohuman.com>
Subject Performance implications of unanlyzed content
Date Fri, 16 Apr 2004 06:59:42 GMT
Hi

I'm developing an application using Lucene where I need to
be able to both search using a stemmer and sometimes using
"exact" search.

I see two ways of doing this:

1. Use two indexes. One using a stemming analyzer and one using
   a SimpleAnalyzer

2. Using duplicate fields. One field with stemmed content and
   one with unstemmed content. (Perhaps the field CONTENT, will be
   CONTENT and CONTENT_RAW)

I'm leaning towards option 2. However I'm interested in any performance
implications. If I understand it correctly Lucene keeps separate
term-dictionaries for each field. So besides the index growing larger
(which might affect caching) it won't be any slower searching the index
with duplicate fields when I only query on the CONTENT field

Is this correct?


Magnus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message