lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-830) norms file can become unexpectedly enormous
Date Tue, 13 Mar 2007 14:14:09 GMT
norms file can become unexpectedly enormous
-------------------------------------------

                 Key: LUCENE-830
                 URL: https://issues.apache.org/jira/browse/LUCENE-830
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.1
            Reporter: Michael McCandless
            Priority: Minor



Spinoff from this user thread:

   http://www.gossamer-threads.com/lists/lucene/java-user/46754

Norms are not stored sparsely, so even if a doc doesn't have field X
we still use up 1 byte in the norms file (and in memory when that
field is searched) for that segment.  I think this is done for
performance at search time?

For indexes that have a large # documents where each document can have
wildly varying fields, each segment will use # documents times # fields
seen in that segment.  When optimize merges all segments, that product
grows multiplicatively so the norms file for the single segment will
require far more storage than the sum of all previous segments' norm
files.

I think it's uncommon to have a huge number of distinct fields (?) so
we would need a solution that doesn't hurt the more common case where
most documents have the same fields.  Maybe something analogous to how
bitvectors are now optionally stored sparsely?

One simple workaround is to disable norms.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message