lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Problems with reopening IndexReader while pushing documents to the index
Date Tue, 01 Jul 2008 12:29:02 GMT

Aha!  OK now I see how that led to your exception.

When you create a MultiReader, passing in the array of IndexReaders,  
MultiReader simply holds onto your array.  It also computes & caches  
norms() the first time its called, based on the total # docs it sees  
in all the readers in that array.

But then when you re-opened single readers in that array, without then  
creating a new MultiReader, this makes the norms array "stale" and  
thus it's easily possible to encounter a docID that's out of bounds.

I think a good fix for this sort of trap would be for MultiReader to  
make a private copy of the array that's passed in.  I'll open an issue.

Mike

Sascha Fahl wrote:

> Yes I am using IndexReader.reopen(). Here is my code doing this:
> 	public void refreshIndeces() throws CorruptIndexException,  
> IOException {
> 		if ((System.currentTimeMillis() - this.lastRefresh) >  
> this.REFRESH_PERIOD) {
> 			this.lastRefresh = System.currentTimeMillis();
> 			boolean refreshFlag = false;
> 			for (int i = 0; i < this.indeces.length; i++){
> 				IndexReader newIR = this.indeces[i].reopen();
> 				if (newIR != this.indeces[i]){
> 					this.indeces[i].close();
> 					refreshFlag = true;
> 				}
> 				this.indeces[i] = newIR;
> 			}
> 			if(refreshFlag){
> 				this.multiReader = new MultiReader(this.indeces);
> 				this.multiSearcher = new IndexSearcher(this.multiReader);
> 			}
> 		}
> 	}
> As you see I am using a MultiReader. With creating a new MultiReader  
> + new IndexSearcher the exception goes away. I tested it with  
> updating the index with 50000 Documents and sent 60000 requests and  
> nothing bad happened.
>
> Sascha
>
>
> Am 01.07.2008 um 12:14 schrieb Michael McCandless:
>
>>
>> That's interesting.  So you are using IndexReader.reopen() to get a  
>> new reader?  Are you closing the previous reader?
>>
>> The exception goes away if you create a new IndexSearcher on the  
>> reopened IndexReader?
>>
>> I don't yet see how that could explain the exception, though.  If  
>> you reopen() the underling IndexReader in an IndexSearcher, the  
>> original IndexReader should still be intact and still searching the  
>> point-in-time snapshot that it had been opened on.  IndexSearcher  
>> itself doens't hold any "state" about the index (I think); it  
>> relies on IndexReader for that.
>>
>> Mike
>>
>> Sascha Fahl wrote:
>>
>>> I think I could solve the "problem". It was no Lucene specific  
>>> problem. What I did was reopen the IndexReader but not creating a  
>>> new IndexSearcher object. But of course as Java always passes  
>>> parameters by value (no matter what parameter) the old  
>>> IndexSearcher object did not see the updated IndexReader object,  
>>> because IndexSearcher is working with its own instance of  
>>> IndexReader and not with the reference to the original  
>>> IndexReader. So what caused
>>> the problem was the requests always were sent to the same instance  
>>> of IndexSearcher. But when the IndexSearcher had to access the  
>>> index physically (the harddisk) of course changes made by the  
>>> IndexWriter were just visible to the IndexReader but not to the  
>>> IndexSearcher.
>>> Is that the explaination Mike?
>>>
>>> Sascha
>>>
>>> Am 01.07.2008 um 10:52 schrieb Michael McCandless:
>>>
>>>>
>>>> By "does not help" do you mean CheckIndex never detects this  
>>>> corruption, yet you then hit that exception when searching?
>>>>
>>>> By "reopening fails" what do you mean?  I thought reopen works  
>>>> fine, but then it's only the search that fails?
>>>>
>>>> Mike
>>>>
>>>> Sascha Fahl wrote:
>>>>
>>>>> Checking the index after adding documents and befor reopening  
>>>>> the IndexReader does not help. After adding documents nothing  
>>>>> bad happens and CheckIndex says the index is all right. But when  
>>>>> I check the index before reopen it
>>>>> CheckIndex does not detect any corruption and says the index is  
>>>>> ok and reopening fails.
>>>>>
>>>>> Sascha
>>>>>
>>>>> Am 30.06.2008 um 18:34 schrieb Michael McCandless:
>>>>>
>>>>>>
>>>>>> This is spooky: that exception means you have some sort of  
>>>>>> index corruption.  The TermScorer thinks it found a doc ID  
>>>>>> 37389, which is out of bounds.
>>>>>>
>>>>>> Reopening IndexReader while IndexWriter is writing should be  
>>>>>> completely fine.
>>>>>>
>>>>>> Is this easily reproduced?  If so, if you could narrow it down  
>>>>>> to sequence of added documents, that'd be awesome.
>>>>>>
>>>>>> It's very strange that you see the corruption go away.  Can you 

>>>>>> run CheckIndex (java org.apache.lucene.index.CheckIndex  
>>>>>> <indexDir>) to see if it detects any corruption.  In fact,
if  
>>>>>> you could run CheckIndex after each session of IndexWriter to  
>>>>>> isolate which batch of added documents causes the corruption,  
>>>>>> that could help us narrow it down.
>>>>>>
>>>>>> Are you changing any of the settings in IndexWriter?  Are you  
>>>>>> using multiple threads?  Which exact JRE version and OS are you 

>>>>>> using?  Are you creating a new index at the start of each run?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>> Sascha Fahl wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I see some strange behavoiur of lucene. The following scenario.
>>>>>>> While adding documents to my index (every doc is pretty small,
 
>>>>>>> doc-count is about 12000) I have implemented a custom  
>>>>>>> behaviour of flushing and committing documents to the index.
 
>>>>>>> Before adding documents to the index I check if wether der  
>>>>>>> ramDocCount has reached a certain number of if the last commit
 
>>>>>>> is a while ago. If so i flush the buffered documents and  
>>>>>>> reopen the IndexWriter. So far, so good. Indexing works very
 
>>>>>>> well. The problem is that if I send requests with die  
>>>>>>> IndexReader while writing documents with the IndexWriter (I 

>>>>>>> send around 10.000 requests to lucene) I reopen the  
>>>>>>> IndexReader every 100 requests (only for testing) if the  
>>>>>>> IndexReader is not current. The first around 4000 requests  
>>>>>>> work very well, but afterwards I always get the following  
>>>>>>> exception:
>>>>>>>
>>>>>>> java.lang.ArrayIndexOutOfBoundsException: 37389
>>>>>>> 	at org.apache.lucene.search.TermScorer.score(TermScorer.java:

>>>>>>> 126)
>>>>>>> 	at  
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:112)
>>>>>>> 	at  
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .search 
>>>>>>> .DisjunctionSumScorer 
>>>>>>> .advanceAfterCurrent(DisjunctionSumScorer.java:172)
>>>>>>> 	at  
>>>>>>> org 
>>>>>>> .apache 
>>>>>>> .lucene 
>>>>>>> .search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:146)
>>>>>>> 	at  
>>>>>>> org 
>>>>>>> .apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:

>>>>>>> 319)
>>>>>>> 	at  
>>>>>>> org 
>>>>>>> .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:

>>>>>>> 146)
>>>>>>> 	at  
>>>>>>> org 
>>>>>>> .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:

>>>>>>> 113)
>>>>>>> 	at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
>>>>>>> 	at org.apache.lucene.search.Hits.<init>(Hits.java:67)
>>>>>>> 	at org.apache.lucene.search.Searcher.search(Searcher.java:46)
>>>>>>> 	at org.apache.lucene.search.Searcher.search(Searcher.java:38)
>>>>>>>
>>>>>>> This seems to be a temporarily problem because opening a new
 
>>>>>>> IndexReader after all documents were added everything is ok 

>>>>>>> again and the 10.000 requests are all right.
>>>>>>>
>>>>>>> So what could be the problem here?
>>>>>>>
>>>>>>> reg,
>>>>>>> sascha
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>> help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> Sascha Fahl
> Softwareenticklung
>
> evenity GmbH
> Zu den Mühlen 19
> D-35390 Gießen
>
> Mail: sascha@evenity.net
>
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message