lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen" <cdor...@gmail.com>
Subject Re: document deletion problem
Date Wed, 19 Dec 2007 11:49:57 GMT
Hi Tushar,

This is an interesting scenario!

The problem arises from the way search() methods that return
Hits are working: for start only 100 matching documents are
collected, assuming that apps calling this method will not
be interested in more documents than this, and that apps
traversing all matching documents (like yours) will use the
HitCollector API and provide their HitCollector (your
HitCollector would then do the deletion).

Anyhow, if an application requests the 101 matching doc,
under the hoods, the query is resubmitted, this time fetching
200 docs, out of which first 100 are ignored and the rest are
provided as results. If more than 200 are needed the next
re-submission would bring 400, then 800, etc.

Now, in your interesting scenario, you deleted every retrieved
doc. The sequence of resubmission of queries is:
100, 200, 400, 800, 1,600, 3,200, 6,400, 12,800 (actually 11,475).
After first 6,400 were deleted and you ask for the result 6,401,
the query is re-submitted, but only 11,475 - 6,400 = 5075 matches
are found. Since you asked for the 6,401 match, Hits attempts to
skip the first 6,400 and fails of course, because there are not that
many docs.

This seems like a bug, because although Hits is not recommended
for this task, for performance considerations, and you should better
use a HitCollector for this - still, this should have worked correctly.

I tend to think that his should just be documented and not necessarily
fixed, not 100% sure which of the two.

Could you file a JIRA Lucene issue for this?

Regards,
Doron

On Dec 19, 2007 12:10 PM, Tushar B <snowhow@sbcglobal.net> wrote:

> Hello All,
>
> I am seeing this issue and would like to understand if its a bug or I am
> missing something and doing the wrong way:
>
> (Note that I am doing all exception handling - but deleted the exception
> handling code for sake of brevity below)
>
> Hits h = m_indexSearcher.search(q); // Returns 11475 documents
> for(int i = 0; i < h.length(); i++)
> {
> int doc = h.id(i);
> m_indexSearcher.getIndexReader().deleteDocument(doc);
> }
>
> The above hits Vector::ArrayIndexOutOfBoundsException when i = 6400. The
> problem happens in Hits::getMoreDocs.
>
> By the time 6400 docs are deleted, the majority is gone and
> topDocs.totalHits becomes less than 6400 (In this case 5075) and finally
> causes exception in the last line of Hits::hitDoc.
>
> I just took the example numbers which occured in my case but this happens
> for any hits > 200 (initial vector size is 100 I guess).
>
> Any insight on the logic here will be very helpful (note: I have a
> workaround too)
>
> thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message