lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Created: (LUCENE-738) read/write .del as d-gaps when the deleted bit vector is sufficiently sparse
Date Mon, 04 Dec 2006 23:08:21 GMT
read/write .del as d-gaps when the deleted bit vector is sufficiently sparse 
-----------------------------------------------------------------------------

                 Key: LUCENE-738
                 URL: http://issues.apache.org/jira/browse/LUCENE-738
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Store
    Affects Versions: 2.1
            Reporter: Doron Cohen
         Assigned To: Doron Cohen


.del file of a segment maintains info on deleted documents in that segment. The file exists
only for segments having deleted docs, so it does not exists for newly created segments (e.g.
resulted from merge). Each time closing an index reader that deleted any document, the .del
file is rewritten. In fact, since the lock-less commits change a new (generation of) .del
file is created in each such occasion.

For small indexes there is no real problem with current situation. But for very large indexes,
each time such an index reader is closed, creating such new bit-vector seems like unnecessary
overhead in cases that the bit vector is sparse (just a few docs were deleted). For instance,
for an index with a segment of 1M docs, the sequence: {open reader; delete 1 doc from that
segment; close reader;} would write a file of ~128KB. Repeat this sequence 8 times: 8 new
files of total size of 1MB are written to disk.

Whether this is a bottleneck or not depends on the application deletes pattern, but for the
case that deleted docs are sparse, writing just the d-gaps would save space and time. 

I have this (simple) change to BitVector running and currently trying some performance tests
to, yet, convince myself on the worthiness of this.



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message