lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <>
Subject [jira] Commented: (LUCENE-830) norms file can become unexpectedly enormous
Date Tue, 13 Mar 2007 19:21:09 GMT


Doron Cohen commented on LUCENE-830:

> You mean for some of the fields, using Fieldable's setOmitNorms(). 

Oops, just noticed this was already suggested that in that for that user thread...

Anyhow, for that specific scenario seems omitNorms would be sufficient, but it won't help
the db based example above.

> norms file can become unexpectedly enormous
> -------------------------------------------
>                 Key: LUCENE-830
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>            Priority: Minor
> Spinoff from this user thread:
> Norms are not stored sparsely, so even if a doc doesn't have field X
> we still use up 1 byte in the norms file (and in memory when that
> field is searched) for that segment.  I think this is done for
> performance at search time?
> For indexes that have a large # documents where each document can have
> wildly varying fields, each segment will use # documents times # fields
> seen in that segment.  When optimize merges all segments, that product
> grows multiplicatively so the norms file for the single segment will
> require far more storage than the sum of all previous segments' norm
> files.
> I think it's uncommon to have a huge number of distinct fields (?) so
> we would need a solution that doesn't hurt the more common case where
> most documents have the same fields.  Maybe something analogous to how
> bitvectors are now optionally stored sparsely?
> One simple workaround is to disable norms.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message