lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Øyvind Stegard <>
Subject Re: WIll storing docs affect lucene's search performance ?
Date Fri, 11 Aug 2006 13:15:59 GMT
On Friday 11 August 2006 15:07, Prasenjit Mukherjee wrote:
> I have a requirement ( use highlighter) to  store the doc content
> somewhere., and I am not allowed to use a RDBMS. I am thinking of using
> Lucene's Field with (Field.Store.YES and Field.Index.NO) to store the
> doc content. Will it have any negative affect on my search performance ?
> I think I have read somewhere that  Lucene shouldn't be used(or
> misused)  to provide RDBMS like storage.
We are using a stored binary version of every field we index in our content 
repository implementation (mostly just primitive data types, though). I asked 
a similar question earlier on this list. I'll just quote the reply I got 
> On 3/9/06, Øyvind Stegard <> wrote:
> > - How does many stored fields eventually affect indexing/query
> > performance compared to if no fields were stored (only indexed) ?
> Additional stored fields should have no effect on querying (the
> internal information about a field is looked up in a hashmap).
> Additional stored fields that are used has an impact on indexing since
> that data must be copied every time segments are merged.
> Additional stored fields that are not used in most documents (sparse)
> should have very little performance impact on indexing.  The field
> list is walked a few times linearly (in-memory) during a segment
> merge, which should be very fast, but it's still O(n), so don't go
> crazy and have a million stored field types.
> > - Are there any known scalability issues with a large amount of distinct
> > fields in an index (not necessarily the same set of fields for every doc)
> > ?
> If they are indexed fields, yes.
> Each indexed field has a 1 byte norm *per document*, regardless of if
> the document contains that field.  In the current version of lucene,
> there is a way to omit these norms on a per field basis (see
> Field.setOmitNorms()) if you don't need length normalization or
> index-time field boosting.
> -Yonik
> Solr, The Open Source Lucene Search Server

< Øyvind Stegard < oyvind stegard at usit uio no >

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message