lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Problems with reopening IndexReader while pushing documents to the index
Date Tue, 01 Jul 2008 10:14:48 GMT

That's interesting.  So you are using IndexReader.reopen() to get a  
new reader?  Are you closing the previous reader?

The exception goes away if you create a new IndexSearcher on the  
reopened IndexReader?

I don't yet see how that could explain the exception, though.  If you  
reopen() the underling IndexReader in an IndexSearcher, the original  
IndexReader should still be intact and still searching the point-in- 
time snapshot that it had been opened on.  IndexSearcher itself  
doens't hold any "state" about the index (I think); it relies on  
IndexReader for that.

Mike

Sascha Fahl wrote:

> I think I could solve the "problem". It was no Lucene specific  
> problem. What I did was reopen the IndexReader but not creating a  
> new IndexSearcher object. But of course as Java always passes  
> parameters by value (no matter what parameter) the old IndexSearcher  
> object did not see the updated IndexReader object, because  
> IndexSearcher is working with its own instance of IndexReader and  
> not with the reference to the original IndexReader. So what caused
> the problem was the requests always were sent to the same instance  
> of IndexSearcher. But when the IndexSearcher had to access the index  
> physically (the harddisk) of course changes made by the IndexWriter  
> were just visible to the IndexReader but not to the IndexSearcher.
> Is that the explaination Mike?
>
> Sascha
>
> Am 01.07.2008 um 10:52 schrieb Michael McCandless:
>
>>
>> By "does not help" do you mean CheckIndex never detects this  
>> corruption, yet you then hit that exception when searching?
>>
>> By "reopening fails" what do you mean?  I thought reopen works  
>> fine, but then it's only the search that fails?
>>
>> Mike
>>
>> Sascha Fahl wrote:
>>
>>> Checking the index after adding documents and befor reopening the  
>>> IndexReader does not help. After adding documents nothing bad  
>>> happens and CheckIndex says the index is all right. But when I  
>>> check the index before reopen it
>>> CheckIndex does not detect any corruption and says the index is ok  
>>> and reopening fails.
>>>
>>> Sascha
>>>
>>> Am 30.06.2008 um 18:34 schrieb Michael McCandless:
>>>
>>>>
>>>> This is spooky: that exception means you have some sort of index  
>>>> corruption.  The TermScorer thinks it found a doc ID 37389, which  
>>>> is out of bounds.
>>>>
>>>> Reopening IndexReader while IndexWriter is writing should be  
>>>> completely fine.
>>>>
>>>> Is this easily reproduced?  If so, if you could narrow it down to  
>>>> sequence of added documents, that'd be awesome.
>>>>
>>>> It's very strange that you see the corruption go away.  Can you  
>>>> run CheckIndex (java org.apache.lucene.index.CheckIndex  
>>>> <indexDir>) to see if it detects any corruption.  In fact, if you 

>>>> could run CheckIndex after each session of IndexWriter to isolate  
>>>> which batch of added documents causes the corruption, that could  
>>>> help us narrow it down.
>>>>
>>>> Are you changing any of the settings in IndexWriter?  Are you  
>>>> using multiple threads?  Which exact JRE version and OS are you  
>>>> using?  Are you creating a new index at the start of each run?
>>>>
>>>> Mike
>>>>
>>>> Sascha Fahl wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I see some strange behavoiur of lucene. The following scenario.
>>>>> While adding documents to my index (every doc is pretty small,  
>>>>> doc-count is about 12000) I have implemented a custom behaviour  
>>>>> of flushing and committing documents to the index. Before adding  
>>>>> documents to the index I check if wether der ramDocCount has  
>>>>> reached a certain number of if the last commit is a while ago.  
>>>>> If so i flush the buffered documents and reopen the IndexWriter.  
>>>>> So far, so good. Indexing works very well. The problem is that  
>>>>> if I send requests with die IndexReader while writing documents  
>>>>> with the IndexWriter (I send around 10.000 requests to lucene) I  
>>>>> reopen the IndexReader every 100 requests (only for testing) if  
>>>>> the IndexReader is not current. The first around 4000 requests  
>>>>> work very well, but afterwards I always get the following  
>>>>> exception:
>>>>>
>>>>> java.lang.ArrayIndexOutOfBoundsException: 37389
>>>>> 	at org.apache.lucene.search.TermScorer.score(TermScorer.java:126)
>>>>> 	at  
>>>>> org 
>>>>> .apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java: 
>>>>> 112)
>>>>> 	at  
>>>>> org 
>>>>> .apache 
>>>>> .lucene 
>>>>> .search 
>>>>> .DisjunctionSumScorer 
>>>>> .advanceAfterCurrent(DisjunctionSumScorer.java:172)
>>>>> 	at  
>>>>> org 
>>>>> .apache 
>>>>> .lucene 
>>>>> .search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:146)
>>>>> 	at  
>>>>> org 
>>>>> .apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
>>>>> 319)
>>>>> 	at  
>>>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
>>>>> 146)
>>>>> 	at  
>>>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
>>>>> 113)
>>>>> 	at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
>>>>> 	at org.apache.lucene.search.Hits.<init>(Hits.java:67)
>>>>> 	at org.apache.lucene.search.Searcher.search(Searcher.java:46)
>>>>> 	at org.apache.lucene.search.Searcher.search(Searcher.java:38)
>>>>>
>>>>> This seems to be a temporarily problem because opening a new  
>>>>> IndexReader after all documents were added everything is ok  
>>>>> again and the 10.000 requests are all right.
>>>>>
>>>>> So what could be the problem here?
>>>>>
>>>>> reg,
>>>>> sascha
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message