lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3828) Impossible to delete doc by docId, undeleteAll or setNorm(docId..)
Date Mon, 27 Feb 2012 17:16:49 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217304#comment-13217304
] 

Yonik Seeley commented on LUCENE-3828:
--------------------------------------

bq. I agree with point (1), because it might be handy to delete documents by internal docId.
I am not sure why this would not be possible through IndexWriter, but the problem here is
the background-merging/merging at all, so with TieredMergePolicy even with a open IndexWriter
the docIds can change suddenly. The only way to get stable docIds would be some mode to freeze
IndexWriter's merging, get an NRT reader, delete documents using the integer ID on IndexWriter,
then unfreeze and commit. IndexReader should of course stay read-only.

Seems like the best way to deleteByDocId in the IndexWriter is to somehow express it as a
custom Query (rather than trying to freeze IndexWriter).
                
> Impossible to delete doc by docId, undeleteAll or setNorm(docId..)
> ------------------------------------------------------------------
>
>                 Key: LUCENE-3828
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3828
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>
> It appears that there is a major regression in the trunk API. It's no longer possible
to:
> 1. delete document by internal id (even though you can iterate and retrieve docs by internal
ids)
> 2. undelete all deleted (but not yet reclaimed) documents
> 3. set norm value on a specific document (by internal id)
> The lack of #1 means that you have to use delete by term or by query, which in turn means
that now we require that documents have a unique primary key (otherwise you won't be able
to delete a particular document that shares terms with other docs). IMHO this item is critical
and should be fixed.
> The lack of #2 might not be critical but it still comes handy in some situations.
> The lack of #3 means that you have to update the whole doc if you just want to correct
one field, which might be ok for the time being - it's a special case of not having updateable
fields in general. But it's quite inconvenient if all you want to do is to adjust a weight
of doc without reindexing, something that is possible with 3.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message