lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay ...@AI.SRI.COM>
Subject Re: DefaultIndexAccessor
Date Wed, 06 Feb 2008 05:35:06 GMT
Great effort for much improved indexaccessor, Mark!
A couple questions and observations:

1. In release(Searcher), you removed a check if the given searcher is 
the cached one from an earlier version. This could potentially  cause 
problems for some people.
2. The createdSearchers variable is not really used: you just populate 
it and print it out. What's the purpose for it?
3. The variable numSearchersForRetirment is  used in 
WarmingIndexAccessor not DefaultIndexAccessor.
4. I wish that in the next release of Lucene, they will add searcher 
reopen api so that we do not have to wor around it.
5. Although currently IndexSearcher.close() does almost nothing except 
to close the internal index reader, it might be a safer to close 
searcher itself as well in closeCachedSearcher(), just in case, the 
searcher may have other resources to release in the future version of 
Lucene.

Thanks!

Jay
Mark Miller wrote:
> For anyone following this thread who would like to check this out, I put 
> up the new code with the warming capability:
> 
> https://issues.apache.org/jira/browse/LUCENE-1026
> <https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip>

> IndexAccessor-02.04.2008.zip 
> <https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip>

> (32 kb)
> 
> See the comment at the bottom.
> 
> Cam Bazz wrote:
>> Hello Mark,
>>
>> Thank you for your lengthy and valuable clarification. I have the case -
>> before adding to the index, i must check if a document exist with the
>> same key (actually, double key) - or before deleting a document - I must
>> ensure it exists in the index.
>>
>> Currently I am doing it with my custom caching routine. It works quite 
>> well
>> upto 32M documents. but after that something happens and it really slows
>> down.
>>
>> I will experiment with your implementation, as soon as I can. It is very
>> cool by the way. Will it be included in the next release?
>>
>> Best,
>> -C.B.
>>
>> On Feb 4, 2008 7:15 PM, Mark Miller <markrmiller@gmail.com> wrote:
>>
>>  
>>> The purpose of IndexAccessor is to coordinate Readers/Writers for a
>>> Lucene index. Readers and Writers in Lucene are multi-threaded in that
>>> multiple threads may use them at the same time, but they must/should be
>>> shared and there are special rules (You cannot delete with a Reader
>>> while a Writer is working on the index). Also, you need to refresh
>>> Reader views every so often; this is expensive (though usually much less
>>> so with the new reopen method).
>>>
>>> IndexAccessor enforces the rules and controls Reader refreshing. Instead
>>> of worrying about caching or index interaction rules, you just ask for
>>> your Reader/Writer, use it to search or add a doc, and then return it.
>>> The rest is taken care of for you.
>>>
>>> This is done by keeping a cached Writer and Searcher(s) that all threads
>>> share. References to the Searchers are counted so that after a Writer is
>>> returned (and no other thread has a reference to the Writer),
>>> IndexAccessor waits for all of the current Searchers to come back and
>>> then reopens their Readers.
>>>
>>> In this regard, you get a  similar setup to what Solr might give: from
>>> any thread you just add docs and run searches -- you don't have to worry
>>> about refreshing Readers or sharing Writers/Readers or one thread
>>> deleting with a Reader while another thread tries to write with a 
>>> Writer.
>>>
>>> This setup allows you to do other cool things, like warm Searchers
>>> before putting them into action. Thats what the code I am posting soon
>>> is be capable of - when the Readers are reopened, search requests will
>>> still be handled by the old Readers while the new Searchers run a sample
>>> query with optional sort fields. This will make sure the Reader is open
>>> and its sort caches are loaded before the first thread tries to use it.
>>> Much faster response to applications.
>>>
>>> You must  open a new Reader or reopen a Reader to see recently added
>>> docs...IndexAccessor provides no real way around that. But it does make
>>> the reopening much easier -- and your application that just wants to add
>>> docs and search at will from multiple threads, won't have to worry about
>>> it.
>>>
>>> You can bail out here, or if you want further clarification I will
>>> include an alternate attempt at what IndexAccessor is below.
>>>
>>> - Mark
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------

>>>
>>> When accessing a Lucene index from multiple threads, there are a variety
>>> of issues that you must address.
>>>
>>> 1. The Readers/Writer should be shared across threads.
>>> 2. Readers must periodically be refreshed, either be creating new
>>> instances or using the new reopen method.
>>> 3. A Reader that writes needs to be properly coordinated with a Writer
>>> eg they cannot be used at the same time.
>>>
>>> IndexAccessor addresses each of these issues.
>>>
>>> How it works:
>>>
>>> A single Writer is shared among threads that try to concurrently
>>> retrieve and use a Writer. Once all of these threads release their
>>> reference
>>> to the Writer, it is closed and upon the next request a new one is
>>> created.
>>>
>>> A single Searcher for each Similarity is also shared across threads.
>>> Upon first request, a new Searcher is created. This Searcher is then
>>> returned
>>> upon every request. A count of every Searcher reference retrieved is
>>> maintained.
>>>
>>> When all references to a Writer are released, the Writer is closed and
>>> after waiting for all of the Searchers to be returned, the Searchers are
>>> reopened. Without warming enabled, new requests for Searchers/Readers
>>> must wait for this reopen to complete. If warming is enabled, the old
>>> Searchers/Readers continue handling Searcher requests until the Readers
>>> have been reopened and any requested sort caches have been loaded.
>>>
>>> If you ask for a writing Reader, you will not get it until a Writer is
>>> released and vice versa.
>>>
>>> The result is that you can freely use Writers/Readers/Searchers from any
>>> thread without considering thread interactions. ***
>>>
>>> If you want to add docs, just ask for a Writer, add the docs, and
>>> release the Writer. If you want to search, get a Searcher, search,
>>> and release the Searcher. You don't have to worry about reopening
>>> Readers or coordinating access.
>>>
>>>
>>> ***
>>> You still do have to consider things like hogging the Writer/Readers -
>>> if you don't occasionally release them, things will not stay very
>>> interactive.
>>> The best method is to just get the object, use it, and then return it in
>>> a finally block. Batch load multiple docs, but if your just randomly
>>> adding
>>> a doc, get the Writer, add it, and then release the Writer in a finally
>>> block. If you are batch loading a million docs and you want to be able
>>> to see them
>>> as they are added: get the writer and add 10,000 docs (or something),
>>> release the Writer, get the Writer and add 10,000 docs, etc.
>>>
>>> Cam Bazz wrote:
>>>    
>>>> Hello Mark,
>>>>
>>>> I have been reading the code - and honestly I have not understood 
>>>> how it
>>>> works. I was hoping that this was a solution to the case when you are
>>>>       
>>> adding
>>>    
>>>> documents - in a multithreaded way, it allows other non-writer threads
>>>>       
>>> to be
>>>    
>>>> able to see documents added without refreshing the indexsearcher - by
>>>>       
>>> using
>>>    
>>>> some caching mechanism.
>>>>
>>>> Could you elaborate what IndexAccessor does and how it does it a little
>>>>       
>>> bit
>>>    
>>>> more?
>>>>
>>>> Best Regards,
>>>> -C.B.
>>>>
>>>> On Feb 4, 2008 3:06 PM, Mark Miller <markrmiller@gmail.com> wrote:
>>>>
>>>>
>>>>      
>>>>> IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip
>>>>>         
>>> from
>>>    
>>>>> now on.
>>>>>
>>>>> I hope to post new code with the warming either tonight or tomorrow
>>>>>         
>>> night.
>>>    
>>>>> I would be ecstatic to have some help vetting that.
>>>>>
>>>>> Also, I am thinking of making a change so that when you release the
>>>>>         
>>> Writer
>>>    
>>>>> the thread that releases does not block until reopen. I think the
>>>>>         
>>> original
>>>    
>>>>> author did this so that if you add a doc with a thread and then
>>>>>         
>>> immediately
>>>    
>>>>> search from the same thread, you are guaranteed to find the doc.
>>>>>         
>>> However,
>>>    
>>>>> this gaurentee did not hold -- if another thread had a reference to 
>>>>> the
>>>>> Writer and a new thread grabbed a Writer and then quicly released
>>>>>         
>>> before the
>>>    
>>>>> first thread, you will have added a doc but it will not be visible
>>>>>         
>>> until the
>>>    
>>>>> first thread releases its reference to the Writer...since the concept
>>>>>         
>>> is not
>>>    
>>>>> enforced anyway, you might as well not block for the final thread that
>>>>> releases the Writer either. Instead I will grab a thread from a thread
>>>>>         
>>> pool
>>>    
>>>>> to do the reopening with that thread, and return right after closing
>>>>>         
>>> the
>>>    
>>>>> Writer. The result is that you cannot add a doc and search and expect
>>>>>         
>>> to
>>>    
>>>>> find it without waiting a second or too. But this way things will be
>>>>> consistent, and an app that adds docs will be a bit more
>>>>>         
>>> responsive....eg it
>>>    
>>>>> wont hang as Readers are being reopened.
>>>>>
>>>>> I also have to bring the AccessProvider classes back. No easy way to
>>>>>         
>>> use
>>>    
>>>>> your own custom Readers without it...I shouldn't have stripped it out.
>>>>>
>>>>> - Mark
>>>>>
>>>>>
>>>>>
>>>>> Cam Bazz wrote:
>>>>>
>>>>>        
>>>>>> Hello,
>>>>>>
>>>>>> Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this
>>>>>>           
>>> seems
>>>    
>>>>>> very interesting. I have read the discussion on the page, but I could
>>>>>>
>>>>>>           
>>>>> not
>>>>>
>>>>>        
>>>>>> figure out which set of files is the latest.
>>>>>> Is it the IndexAccessor-1.26.2008.zip file?
>>>>>>
>>>>>> I will read through the code, make my own tests, and send some
>>>>>>           
>>> feedback.
>>>    
>>>>>> Best.
>>>>>> -C.B.
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>>>         
>>>>       
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>     
>>
>>   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message