lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: How to delete partial index
Date Tue, 12 Dec 2006 21:05:47 GMT
spinergywmy <spinergywmy@gmail.com> wrote:
>
> Hi,
>
>    I have ask this question before but may be the question wasn't clear.
>
>    How can I delete particular index that I want to and keep the rest?
For
> instance, I have been indexed document Id, date, user Id and contents, my
> question is does that particular contents will be deleted if I just
> specified the document Id, and I used reader.deleteDocument(document Id).

I am not 100% sure what you mean by "delete particular index".
Here "index" is mostly used as in "The result of indexing documents with
Lucene is an __INDEX__ that can then be searched, updated, etc."
But perhaps by index you mean "an __ID__ of a certain document"?

Anyhow, assume you created a lucene index and added to it some 1000
documents. You can now delete documents from that index. The remaining
index would still be valid and useful but would have less documents that
are valid as search results. For instance, if you make 500 calls:
reader.deleteDocument(0),reader.deleteDocument(2),...,reader.deleteDocument(998);,
 your index would now have only 500 remaining documents that are valid as
search results. Their (internal) docids would (temporarily) be:
1,3,5,...,999. A few things to notice about this:
1) The fact that these documents were deleted would be:
- reflected immediately in searches that use a Searcher opened against the
same IndexReader used for these deletions.
- reflected in the index Directory only once the deleting IndexReader is
closed.
- reflected in searches through other IndexSearchers only if they are
opened after the deleting IndexReader is closed.
2) The "deleted" documents are still in the index, for some time. They are
excluded from search results, since they are marked deleted. When an index
segment is merged (either exlicitely as result of call to optimize or
implicitely as result of adding a document or closing an index writer), the
segment's deleted documnets are actually deleted, and the docids are
modified so as to discard the (internal) docids gaps. In the example above,
after optimize(), you would have (internal) docid: 0,1,2,..,499.
3) As a consequence of all this, it is usually not the best thing to count
on (internal) docids, and so deleting doucments by a term would usually be
safer.

>
>    And I have another question is if I do normal cut and paste the
document,
> how can I delete the index content from one destination and restore to
> another destination and the index file must merge.

Again I am not sure what you mean here.
Is the scenario that you have, say, two Lucene indexes, I1 and I2, and I1
has 100 documents (0.99), and I2 has 100 documents (0..99), and you want to
"cut and paste", say, some documents (say, all documents containing the
term "MoveMeToTheOtherIndex") from index I1 to I2, and, assume there are 50
documents like this in I1, after that there would be 50 (undeleted) docs
left in I1 and 150 docs in I2...? If this is the case Lucene does not
supports this and an application would need to implement this by itself -
adding the "moved" documents to index I2. Why would you want to do this?

>
>    Thanks
>
>
> regards,
> Wooi Meng
> --
> View this message in context:
> http://www.nabble.com/How-to-delete-partial-index-tf2806204.html#a7829277


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message