lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Problems with reopening IndexReader while pushing documents to the index
Date Tue, 01 Jul 2008 08:52:53 GMT

By "does not help" do you mean CheckIndex never detects this  
corruption, yet you then hit that exception when searching?

By "reopening fails" what do you mean?  I thought reopen works fine,  
but then it's only the search that fails?

Mike

Sascha Fahl wrote:

> Checking the index after adding documents and befor reopening the  
> IndexReader does not help. After adding documents nothing bad  
> happens and CheckIndex says the index is all right. But when I check  
> the index before reopen it
> CheckIndex does not detect any corruption and says the index is ok  
> and reopening fails.
>
> Sascha
>
> Am 30.06.2008 um 18:34 schrieb Michael McCandless:
>
>>
>> This is spooky: that exception means you have some sort of index  
>> corruption.  The TermScorer thinks it found a doc ID 37389, which  
>> is out of bounds.
>>
>> Reopening IndexReader while IndexWriter is writing should be  
>> completely fine.
>>
>> Is this easily reproduced?  If so, if you could narrow it down to  
>> sequence of added documents, that'd be awesome.
>>
>> It's very strange that you see the corruption go away.  Can you run  
>> CheckIndex (java org.apache.lucene.index.CheckIndex <indexDir>) to  
>> see if it detects any corruption.  In fact, if you could run  
>> CheckIndex after each session of IndexWriter to isolate which batch  
>> of added documents causes the corruption, that could help us narrow  
>> it down.
>>
>> Are you changing any of the settings in IndexWriter?  Are you using  
>> multiple threads?  Which exact JRE version and OS are you using?   
>> Are you creating a new index at the start of each run?
>>
>> Mike
>>
>> Sascha Fahl wrote:
>>
>>> Hi,
>>>
>>> I see some strange behavoiur of lucene. The following scenario.
>>> While adding documents to my index (every doc is pretty small, doc- 
>>> count is about 12000) I have implemented a custom behaviour of  
>>> flushing and committing documents to the index. Before adding  
>>> documents to the index I check if wether der ramDocCount has  
>>> reached a certain number of if the last commit is a while ago. If  
>>> so i flush the buffered documents and reopen the IndexWriter. So  
>>> far, so good. Indexing works very well. The problem is that if I  
>>> send requests with die IndexReader while writing documents with  
>>> the IndexWriter (I send around 10.000 requests to lucene) I reopen  
>>> the IndexReader every 100 requests (only for testing) if the  
>>> IndexReader is not current. The first around 4000 requests work  
>>> very well, but afterwards I always get the following exception:
>>>
>>> java.lang.ArrayIndexOutOfBoundsException: 37389
>>> 	at org.apache.lucene.search.TermScorer.score(TermScorer.java:126)
>>> 	at  
>>> org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java: 
>>> 112)
>>> 	at  
>>> org 
>>> .apache 
>>> .lucene 
>>> .search 
>>> .DisjunctionSumScorer 
>>> .advanceAfterCurrent(DisjunctionSumScorer.java:172)
>>> 	at  
>>> org 
>>> .apache 
>>> .lucene.search.DisjunctionSumScorer.next(DisjunctionSumScorer.java: 
>>> 146)
>>> 	at  
>>> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java: 
>>> 319)
>>> 	at  
>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
>>> 146)
>>> 	at  
>>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java: 
>>> 113)
>>> 	at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
>>> 	at org.apache.lucene.search.Hits.<init>(Hits.java:67)
>>> 	at org.apache.lucene.search.Searcher.search(Searcher.java:46)
>>> 	at org.apache.lucene.search.Searcher.search(Searcher.java:38)
>>>
>>> This seems to be a temporarily problem because opening a new  
>>> IndexReader after all documents were added everything is ok again  
>>> and the 10.000 requests are all right.
>>>
>>> So what could be the problem here?
>>>
>>> reg,
>>> sascha
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message