lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.
Date Mon, 26 Oct 2009 10:50:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769980#action_12769980
] 

Michael McCandless commented on LUCENE-2010:
--------------------------------------------

bq. If you delete all documents from the whole index, no segments would keep alive if automatically
removed.

IW now has a dedicated method to [efficiently] delete all docs, but yeah we should also short-circuit
this, in case someone didn't use that method and instead actually deleted every doc separately.

I'd think that our solution here would automatically handle this case (drop all segments)
as well.

On materializing deletes (IndexWriter.applyDeletes) we should simply sweep the segmentInfos,
and drop any fully deleted segments.  Should be a simple change.

> Remove segments with all documents deleted in commit/flush/close of IndexWriter instead
of waiting until a merge occurs.
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2010
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2010
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>
> I do not know if this is a bug in 2.9.0, but it seems that segments with all documents
deleted are not automatically removed:
> {noformat}
> 4 of 14: name=_dlo docCount=5
>   compound=true
>   hasProx=true
>   numFiles=2
>   size (MB)=0.059
>   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 2009-09-21 10:25:09,
os=SunOS,
>      os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, source=flush}
>   has deletions [delFileName=_dlo_1.del]
>   test: open reader.........OK [5 deleted docs]
>   test: fields..............OK [136 fields]
>   test: field norms.........OK [136 fields]
>   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
>   test: stored fields.......OK [0 total field count; avg ? fields per doc]
>   test: term vectors........OK [0 total vector count; avg ? term/freq vector fields per
doc]
> {noformat}
> Shouldn't such segments not be removed automatically during the next commit/close of
IndexWriter?
> *Mike McCandless:*
> Lucene doesn't actually short-circuit this case, ie, if every single doc in a given segment
has been deleted, it will still merge it [away] like normal, rather than simply dropping it
immediately from the index, which I agree would be a simple optimization. Can you open a new
issue? I would think IW can drop such a segment immediately (ie not wait for a merge or optimize)
on flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message