lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <>
Subject Re: problems with deleteDocuments
Date Sun, 08 Jul 2007 06:39:58 GMT
On Wed, Jul 04, 2007, Erick Erickson wrote about "Re: problems with deleteDocuments":
> Consider what would happen otherwise. Say you have documents
> with the following values for a field (call it blah).
> some data
> some data I put in the index
> lots of data
> data
> Then I don't want deleting on the term blah:data to remove all
> of them. Which seems to be what you're asking. Even if
> you restricted things to "phrases", then deleting on the term
> 'blah:some data' would remove two documents.
> So, while UN_TOKENIZED isn't a *requirement*, exact total term
> matches *is* the requirement. By that, I meant that whatever
> goes into the field must not be broken into pieces by the indexing
> tokenizer for deletes to work as you expect.

I disagree, and frankly, am very surprised that "exact total term matches" is
actually a requirement (I never tried it, so you may be absolutely right, I
just hope you aren't).

Let me give you just one example where id fields containing multiple words,
and the ability for a delete query to match several documents, are useful.

Consider an application for indexing emails with attachments. The email text,
and each document attachment, is indexed as a separate document. When an
email is deleted, we also need to delete its attachments. How shall we do
this? One simple implementation is to have an "id" field for each document;
The email text document will have a unique id, and the attachment document
will have two ids: its own unique id, and the containing email's id. When
we need to remove an email and all its attachments, we just remove all
documents that match the email's id - and this will include the main text
and the attachments.

By the way, the method is called "deleteDocuments" - doesn't that imply
that it's perfectly acceptable to delete many documents with one term?

Nadav Har'El                        |      Sunday, Jul  8 2007, 22 Tammuz 5767
IBM Haifa Research Lab              |-----------------------------------------
                                    |I am not a complete idiot - some parts           |are missing.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message