lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Magnus Johansson" <>
Subject Performance implications of unanlyzed content
Date Fri, 16 Apr 2004 06:59:42 GMT

I'm developing an application using Lucene where I need to
be able to both search using a stemmer and sometimes using
"exact" search.

I see two ways of doing this:

1. Use two indexes. One using a stemming analyzer and one using
   a SimpleAnalyzer

2. Using duplicate fields. One field with stemmed content and
   one with unstemmed content. (Perhaps the field CONTENT, will be

I'm leaning towards option 2. However I'm interested in any performance
implications. If I understand it correctly Lucene keeps separate
term-dictionaries for each field. So besides the index growing larger
(which might affect caching) it won't be any slower searching the index
with duplicate fields when I only query on the CONTENT field

Is this correct?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message