lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harald Kirsch <Harald.Kir...@raytion.com>
Subject Re: Problem with near realtime search
Date Sat, 04 Aug 2012 05:58:59 GMT
Hello Simon,

now that I knew what to search for I found

http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F

So that clearly explains this issue for me.

Many thanks for your help.

Harald



Am 04.08.2012 07:38, schrieb Harald Kirsch:
> Hello Simon,
>
> thanks for the information. I really thought that once a docId is
> assigned it is kept until the document is deleted. The only problem I
> would have expected are docIds that no longer refer to a document,
> because it was deleted in the meantime. But this is clearly not the case
> in my setup.
>
> But if docIds change during index rearrangement, then this would of
> course completely explain the symptoms I saw.
>
> So docIds can definitively change under the hood?
>
> Harald.
>
>
> Am 03.08.2012 17:24, schrieb Simon Willnauer:
>> hey harald,
>>
>> if you use a possibly different searcher (reader) than you used for
>> the search you will run into problems with the doc IDs since they
>> might change during the request. I suggest you to use SearcherManager
>> or NRTMangager and carry on the searcher reference when you collect
>> the stored values. Just keep around the searcher you used and
>> NRTManager / SearcherManager will do the job for you.
>>
>> simon
>>
>> On Fri, Aug 3, 2012 at 3:41 PM, Harald Kirsch
>> <Harald.Kirsch@raytion.com> wrote:
>>> I am trying to (mis)use Lucene a bit like a NoSQL database or, rather, a
>>> persistent map. I am entering 38000 documents at a rate of 1000/s to the
>>> index. Because each item add may be actually an update, I have a
>>> sequence of
>>> read/change/write for each of the documents.
>>>
>>> All goes well until when just after writing the last item, I run a query
>>> that retrieves about 16000 documents. All docids are collected in a
>>> Collector, and, yes, I make sure to rebase the docIds. Then I iterate
>>> over
>>> all docIds found and retrieve the documents basically like this:
>>>
>>>    for(int docId : docIds) {
>>>      Document d = getSearcher().doc(docId);
>>>      ..
>>>    }
>>>
>>> where getSearcher() uses IndexReader.openIfChanged() to always get
>>> the most
>>> current searcher and makes sure to eventually close the old searcher.
>>>
>>>
>>> At document 15940 I get an exception like this:
>>>
>>> Exception in thread "main" java.lang.IllegalArgumentException: docID
>>> must be
>>>> = 0 and < maxDoc=1 (got docID=1)
>>>          at
>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:490)
>>>          at
>>> org.apache.lucene.index.DirectoryReader.document(DirectoryReader.java:568)
>>>
>>>          at
>>> org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:264)
>>>
>>> I can get rid of the Exception by one of two ways that I both don't
>>> like:
>>>
>>> 1) Put a Thread.sleep(1000) just before running the query+document
>>> retrieval
>>> part.
>>>
>>> 2) Use the same IndexSearcher to retrieve all documents instead of
>>> calling
>>> getSearcher for each document retrieval.
>>>
>>> This is just a test single threaded test program. I only see Lucene
>>> Merge
>>> threads in jvisualvm besides the main thread. A breakpoint on the
>>> exception
>>> shows that org.apache.lucene.index.DirectoryReader.document does seem to
>>> have wrong segments, which triggers the Exception.
>>>
>>> Since Lucene 3.6.1 is in productive use for some time I doubt it is a
>>> bug in
>>> Lucene, but I don't see what I am doing wrong. It might be connected to
>>> trying to get the freshest IndexReader for retrieving documents.
>>>
>>> Any better ideas or explanations?
>>>
>>> Harald.
>>>
>>> --
>>> Harald Kirsch
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

-- 
Harald Kirsch
Raytion GmbH
Kaiser-Friedrich-Ring 74
40547 Duesseldorf
Fon +49-211-550266-0
Fax +49-211-550266-19
http://www.raytion.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message