lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler V" <tyler...@gmail.com>
Subject Re: Corrupted Indexes Under Lucene 2.3 (and 2.3.1)
Date Sat, 01 Mar 2008 19:19:10 GMT
Thanks for the reply Yonik.

Our workflow is as follows:

We build a very large document and put the document on a queue to be
added to our "complete" index. This queue is serviced by a separate
thread, which actually adds the document to the "complete" index.

Once the document has been placed on the queue, we pause for other
processing, then remove a bunch of fields from the document and place
this document on a separate queue to be serviced by another thread,
which will add the document to a "summary" index.

The concurrency problem here is clear, as it is possible for a
document's fields to be modified while the document is being added to
the "complete" index.  From what you and Mike have mentioned, I
believe this simultaneous modification is causing the first exception.

The assumption that was being made in my algorithm is that the
"complete" index queue would be serviced before the document's fields
are removed for the "summary" index.  This assumption is not correct.
The solution that I am testing now is to "clone" the original document
before any fields are removed.  Thus the original document will not be
modified and the concurrency issues will be avoided.

Given that this problem only occurred around once a week, it will take
me a while before I can report success.  But from reading the posts
from Mike and yourself, it seems that this is indeed the cause of the
corruption.

Thanks again for you insights, I will report back with my results.

Tyler

On Fri, Feb 29, 2008 at 6:01 PM, Yonik Seeley <yonik@apache.org> wrote:
>
> On Fri, Feb 29, 2008 at 7:05 PM, Tyler V <tylervsd@gmail.com> wrote:
> > Mike -- Thanks so much for the prompt reply.
> >
> >  You are right, we are accessing these documents with multiple threads
> >  (and have always been). However, I am wondering if the increased
> >  indexing speed in 2.3 has revealed a hidden concurrency issue.
>
> You are modifying the documents from multiple threads?
>
> My fault... I removed the synchronization on Document (changed from
> Vector to ArrayList).  It was never guaranteed to be thread-safe for
> modification, and almost never makes sense without external
> synchronization anyway.
>
> If you really need to modify a single document from multiple threads,
> please synchronize.
> That explains the first exception, but no the second.  I assume you
> aren't still changing the document while it's being indexed?
> It appears as if the original exception causes corruption.
>
> -Yonik
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message