lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-756) Maintain norms in a single file .nrm
Date Thu, 21 Dec 2006 04:28:22 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-756?page=all ]

Doron Cohen updated LUCENE-756:
-------------------------------

    Attachment: nrm.patch.txt

Replacing the patch file (prev file was garbage - "svn stat" instead of "svn diff").

Few words on how this patch works: 
- <segment>.nrm file was added.
- addDocument  (DocumentWriter) still writes each norm to a separate file - but that's in
memory, 
- at merge, all norms are written to a single file.
- CFS now also maintains all norms in a single file.
- IndexWriter merge-decision now considers hasSeparateNorms() not only for CFS but also for
non compound.
- SegmentReader.openNorms() still creates ready-to-use/load Norm objects (which would read
the norms only when needed). But the Norm object is now assigned a normSeek value, which is
nonzero if the norm file is <segment>.nrm.
- existing indexes, prior to this change, are managed the same way that segments resulted
of addDocument are managed.

Tests:
- I verified that also the (contrib) tests for FieldNormModifier and LengthNormModofier are
working.

Remaining:
- I might add a test.
- more benchmarking?
- update fileFormat document.

> Maintain norms in a single file .nrm
> ------------------------------------
>
>                 Key: LUCENE-756
>                 URL: http://issues.apache.org/jira/browse/LUCENE-756
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>            Priority: Minor
>         Attachments: nrm.patch.txt
>
>
> Non-compound indexes are ~10% faster at indexing, and perform 50% IO activity comparing
to compound indexes. But their file descriptors foot print is much higher. 
> By maintaining all field norms in a single .nrm file, we can bound the number of files
used by non compound indexes, and possibly allow more applications to use this format.
> More details on the motivation for this in: http://www.nabble.com/potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-tf2826909.html
(in particular http://www.nabble.com/Re%3A-potential-indexing-perormance-improvement-for-compound-index---cut-IO---have-more-files-though-p7910403.html).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message