Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Cc: java-user@lucene.apache.org,
 Sascha Fahl <sascha.fahl@googlemail.com>
Message-Id: <42DDF7B2-ED4A-442E-8DD2-0085403584D3@mikemccandless.com>
From: Michael McCandless <lucene@mikemccandless.com>
To: Michael McCandless <lucene@mikemccandless.com>
In-Reply-To: <D64D7D22-B97E-4A02-B661-6C1E910AB3DC@mikemccandless.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v924)
Subject: Re: Problems with reopening IndexReader while pushing documents to
 the index
Date: Tue, 1 Jul 2008 08:36:35 -0400
References: <88F3F6A4-FBFB-43DF-890D-DB5F0D9A2461@gmail.com>
 <11EDA512-DB42-49EB-A2E1-8857C00D2A53@mikemccandless.com>
 <1B9E2FCB-D146-4493-A45C-42A1A2123618@gmail.com>
 <57B9F4B5-E76D-4991-837B-1723E9A47674@mikemccandless.com>
 <1BBE0E5C-3830-475E-A5E8-DBC1F3513795@gmail.com>
 <926E7E42-26C5-489C-BE94-BB6E1F35735D@mikemccandless.com>
 <D27B5FC2-A8F2-4235-8FF6-286520E8B330@evenity.net>
 <D64D7D22-B97E-4A02-B661-6C1E910AB3DC@mikemccandless.com>


OK I've opened:

     https://issues.apache.org/jira/browse/LUCENE-1323

I'll commit the fix (to trunk, to be included in 2.4) soon.

Mike

Michael McCandless wrote:

>
> Aha!  OK now I see how that led to your exception.
>
> When you create a MultiReader, passing in the array of IndexReaders, =20=

> MultiReader simply holds onto your array.  It also computes & caches =20=

> norms() the first time its called, based on the total # docs it sees =20=

> in all the readers in that array.
>
> But then when you re-opened single readers in that array, without =20
> then creating a new MultiReader, this makes the norms array "stale" =20=

> and thus it's easily possible to encounter a docID that's out of =20
> bounds.
>
> I think a good fix for this sort of trap would be for MultiReader to =20=

> make a private copy of the array that's passed in.  I'll open an =20
> issue.
>
> Mike
>
> Sascha Fahl wrote:
>
>> Yes I am using IndexReader.reopen(). Here is my code doing this:
>> 	public void refreshIndeces() throws CorruptIndexException, =20
>> IOException {
>> 		if ((System.currentTimeMillis() - this.lastRefresh) > =20=

>> this.REFRESH_PERIOD) {
>> 			this.lastRefresh =3D System.currentTimeMillis();
>> 			boolean refreshFlag =3D false;
>> 			for (int i =3D 0; i < this.indeces.length; i++){
>> 				IndexReader newIR =3D =
this.indeces[i].reopen();
>> 				if (newIR !=3D this.indeces[i]){
>> 					this.indeces[i].close();
>> 					refreshFlag =3D true;
>> 				}
>> 				this.indeces[i] =3D newIR;
>> 			}
>> 			if(refreshFlag){
>> 				this.multiReader =3D new =
MultiReader(this.indeces);
>> 				this.multiSearcher =3D new =
IndexSearcher(this.multiReader);
>> 			}
>> 		}
>> 	}
>> As you see I am using a MultiReader. With creating a new =20
>> MultiReader + new IndexSearcher the exception goes away. I tested =20
>> it with updating the index with 50000 Documents and sent 60000 =20
>> requests and nothing bad happened.
>>
>> Sascha
>>
>>
>> Am 01.07.2008 um 12:14 schrieb Michael McCandless:
>>
>>>
>>> That's interesting.  So you are using IndexReader.reopen() to get =20=

>>> a new reader?  Are you closing the previous reader?
>>>
>>> The exception goes away if you create a new IndexSearcher on the =20
>>> reopened IndexReader?
>>>
>>> I don't yet see how that could explain the exception, though.  If =20=

>>> you reopen() the underling IndexReader in an IndexSearcher, the =20
>>> original IndexReader should still be intact and still searching =20
>>> the point-in-time snapshot that it had been opened on.  =20
>>> IndexSearcher itself doens't hold any "state" about the index (I =20
>>> think); it relies on IndexReader for that.
>>>
>>> Mike
>>>
>>> Sascha Fahl wrote:
>>>
>>>> I think I could solve the "problem". It was no Lucene specific =20
>>>> problem. What I did was reopen the IndexReader but not creating a =20=

>>>> new IndexSearcher object. But of course as Java always passes =20
>>>> parameters by value (no matter what parameter) the old =20
>>>> IndexSearcher object did not see the updated IndexReader object, =20=

>>>> because IndexSearcher is working with its own instance of =20
>>>> IndexReader and not with the reference to the original =20
>>>> IndexReader. So what caused
>>>> the problem was the requests always were sent to the same =20
>>>> instance of IndexSearcher. But when the IndexSearcher had to =20
>>>> access the index physically (the harddisk) of course changes made =20=

>>>> by the IndexWriter were just visible to the IndexReader but not =20
>>>> to the IndexSearcher.
>>>> Is that the explaination Mike?
>>>>
>>>> Sascha
>>>>
>>>> Am 01.07.2008 um 10:52 schrieb Michael McCandless:
>>>>
>>>>>
>>>>> By "does not help" do you mean CheckIndex never detects this =20
>>>>> corruption, yet you then hit that exception when searching?
>>>>>
>>>>> By "reopening fails" what do you mean?  I thought reopen works =20
>>>>> fine, but then it's only the search that fails?
>>>>>
>>>>> Mike
>>>>>
>>>>> Sascha Fahl wrote:
>>>>>
>>>>>> Checking the index after adding documents and befor reopening =20
>>>>>> the IndexReader does not help. After adding documents nothing =20
>>>>>> bad happens and CheckIndex says the index is all right. But =20
>>>>>> when I check the index before reopen it
>>>>>> CheckIndex does not detect any corruption and says the index is =20=

>>>>>> ok and reopening fails.
>>>>>>
>>>>>> Sascha
>>>>>>
>>>>>> Am 30.06.2008 um 18:34 schrieb Michael McCandless:
>>>>>>
>>>>>>>
>>>>>>> This is spooky: that exception means you have some sort of =20
>>>>>>> index corruption.  The TermScorer thinks it found a doc ID =20
>>>>>>> 37389, which is out of bounds.
>>>>>>>
>>>>>>> Reopening IndexReader while IndexWriter is writing should be =20
>>>>>>> completely fine.
>>>>>>>
>>>>>>> Is this easily reproduced?  If so, if you could narrow it down =20=

>>>>>>> to sequence of added documents, that'd be awesome.
>>>>>>>
>>>>>>> It's very strange that you see the corruption go away.  Can =20
>>>>>>> you run CheckIndex (java org.apache.lucene.index.CheckIndex =20
>>>>>>> <indexDir>) to see if it detects any corruption.  In fact, if =20=

>>>>>>> you could run CheckIndex after each session of IndexWriter to =20=

>>>>>>> isolate which batch of added documents causes the corruption, =20=

>>>>>>> that could help us narrow it down.
>>>>>>>
>>>>>>> Are you changing any of the settings in IndexWriter?  Are you =20=

>>>>>>> using multiple threads?  Which exact JRE version and OS are =20
>>>>>>> you using?  Are you creating a new index at the start of each =20=

>>>>>>> run?
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> Sascha Fahl wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I see some strange behavoiur of lucene. The following scenario.
>>>>>>>> While adding documents to my index (every doc is pretty =20
>>>>>>>> small, doc-count is about 12000) I have implemented a custom =20=

>>>>>>>> behaviour of flushing and committing documents to the index. =20=

>>>>>>>> Before adding documents to the index I check if wether der =20
>>>>>>>> ramDocCount has reached a certain number of if the last =20
>>>>>>>> commit is a while ago. If so i flush the buffered documents =20
>>>>>>>> and reopen the IndexWriter. So far, so good. Indexing works =20
>>>>>>>> very well. The problem is that if I send requests with die =20
>>>>>>>> IndexReader while writing documents with the IndexWriter (I =20
>>>>>>>> send around 10.000 requests to lucene) I reopen the =20
>>>>>>>> IndexReader every 100 requests (only for testing) if the =20
>>>>>>>> IndexReader is not current. The first around 4000 requests =20
>>>>>>>> work very well, but afterwards I always get the following =20
>>>>>>>> exception:
>>>>>>>>
>>>>>>>> java.lang.ArrayIndexOutOfBoundsException: 37389
>>>>>>>> 	at =
org.apache.lucene.search.TermScorer.score(TermScorer.java:=20
>>>>>>>> 126)
>>>>>>>> 	at =20
>>>>>>>> org=20
>>>>>>>> .apache=20
>>>>>>>> .lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:112)
>>>>>>>> 	at =20
>>>>>>>> org=20
>>>>>>>> .apache=20
>>>>>>>> .lucene=20
>>>>>>>> .search=20
>>>>>>>> .DisjunctionSumScorer=20
>>>>>>>> .advanceAfterCurrent(DisjunctionSumScorer.java:172)
>>>>>>>> 	at =20
>>>>>>>> org=20
>>>>>>>> .apache=20
>>>>>>>> .lucene=20
>>>>>>>> .search.DisjunctionSumScorer.next(DisjunctionSumScorer.java:=20
>>>>>>>> 146)
>>>>>>>> 	at =20
>>>>>>>> org=20
>>>>>>>> .apache=20
>>>>>>>> .lucene.search.BooleanScorer2.score(BooleanScorer2.java:319)
>>>>>>>> 	at =20
>>>>>>>> org=20
>>>>>>>> .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:=20=

>>>>>>>> 146)
>>>>>>>> 	at =20
>>>>>>>> org=20
>>>>>>>> .apache.lucene.search.IndexSearcher.search(IndexSearcher.java:=20=

>>>>>>>> 113)
>>>>>>>> 	at =
org.apache.lucene.search.Hits.getMoreDocs(Hits.java:100)
>>>>>>>> 	at org.apache.lucene.search.Hits.<init>(Hits.java:67)
>>>>>>>> 	at =
org.apache.lucene.search.Searcher.search(Searcher.java:46)
>>>>>>>> 	at =
org.apache.lucene.search.Searcher.search(Searcher.java:38)
>>>>>>>>
>>>>>>>> This seems to be a temporarily problem because opening a new =20=

>>>>>>>> IndexReader after all documents were added everything is ok =20
>>>>>>>> again and the 10.000 requests are all right.
>>>>>>>>
>>>>>>>> So what could be the problem here?
>>>>>>>>
>>>>>>>> reg,
>>>>>>>> sascha
>>>>>>>>
>>>>>>>> =
---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: =
java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> =
---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> Sascha Fahl
>> Softwareenticklung
>>
>> evenity GmbH
>> Zu den M=FChlen 19
>> D-35390 Gie=DFen
>>
>> Mail: sascha@evenity.net
>>
>>
>>
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org