lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: DefaultIndexAccessor
Date Mon, 04 Feb 2008 17:15:53 GMT
The purpose of IndexAccessor is to coordinate Readers/Writers for a 
Lucene index. Readers and Writers in Lucene are multi-threaded in that 
multiple threads may use them at the same time, but they must/should be 
shared and there are special rules (You cannot delete with a Reader 
while a Writer is working on the index). Also, you need to refresh 
Reader views every so often; this is expensive (though usually much less 
so with the new reopen method).

IndexAccessor enforces the rules and controls Reader refreshing. Instead 
of worrying about caching or index interaction rules, you just ask for 
your Reader/Writer, use it to search or add a doc, and then return it. 
The rest is taken care of for you.

This is done by keeping a cached Writer and Searcher(s) that all threads 
share. References to the Searchers are counted so that after a Writer is 
returned (and no other thread has a reference to the Writer), 
IndexAccessor waits for all of the current Searchers to come back and 
then reopens their Readers.

In this regard, you get a  similar setup to what Solr might give: from 
any thread you just add docs and run searches -- you don't have to worry 
about refreshing Readers or sharing Writers/Readers or one thread 
deleting with a Reader while another thread tries to write with a Writer.

This setup allows you to do other cool things, like warm Searchers 
before putting them into action. Thats what the code I am posting soon 
is be capable of - when the Readers are reopened, search requests will 
still be handled by the old Readers while the new Searchers run a sample 
query with optional sort fields. This will make sure the Reader is open 
and its sort caches are loaded before the first thread tries to use it. 
Much faster response to applications.

You must  open a new Reader or reopen a Reader to see recently added 
docs...IndexAccessor provides no real way around that. But it does make 
the reopening much easier -- and your application that just wants to add 
docs and search at will from multiple threads, won't have to worry about it.

You can bail out here, or if you want further clarification I will 
include an alternate attempt at what IndexAccessor is below.

- Mark

----------------------------------------------------------------------------------------------------
When accessing a Lucene index from multiple threads, there are a variety 
of issues that you must address.

1. The Readers/Writer should be shared across threads.
2. Readers must periodically be refreshed, either be creating new 
instances or using the new reopen method.
3. A Reader that writes needs to be properly coordinated with a Writer 
eg they cannot be used at the same time.

IndexAccessor addresses each of these issues.

How it works:

A single Writer is shared among threads that try to concurrently 
retrieve and use a Writer. Once all of these threads release their 
reference
to the Writer, it is closed and upon the next request a new one is created.

A single Searcher for each Similarity is also shared across threads. 
Upon first request, a new Searcher is created. This Searcher is then 
returned
upon every request. A count of every Searcher reference retrieved is 
maintained.

When all references to a Writer are released, the Writer is closed and 
after waiting for all of the Searchers to be returned, the Searchers are
reopened. Without warming enabled, new requests for Searchers/Readers 
must wait for this reopen to complete. If warming is enabled, the old
Searchers/Readers continue handling Searcher requests until the Readers 
have been reopened and any requested sort caches have been loaded.

If you ask for a writing Reader, you will not get it until a Writer is 
released and vice versa.

The result is that you can freely use Writers/Readers/Searchers from any 
thread without considering thread interactions. ***

If you want to add docs, just ask for a Writer, add the docs, and 
release the Writer. If you want to search, get a Searcher, search,
and release the Searcher. You don't have to worry about reopening 
Readers or coordinating access.


***
You still do have to consider things like hogging the Writer/Readers - 
if you don't occasionally release them, things will not stay very 
interactive.
The best method is to just get the object, use it, and then return it in 
a finally block. Batch load multiple docs, but if your just randomly adding
a doc, get the Writer, add it, and then release the Writer in a finally 
block. If you are batch loading a million docs and you want to be able 
to see them
as they are added: get the writer and add 10,000 docs (or something), 
release the Writer, get the Writer and add 10,000 docs, etc.

Cam Bazz wrote:
> Hello Mark,
>
> I have been reading the code - and honestly I have not understood how it
> works. I was hoping that this was a solution to the case when you are adding
> documents - in a multithreaded way, it allows other non-writer threads to be
> able to see documents added without refreshing the indexsearcher - by using
> some caching mechanism.
>
> Could you elaborate what IndexAccessor does and how it does it a little bit
> more?
>
> Best Regards,
> -C.B.
>
> On Feb 4, 2008 3:06 PM, Mark Miller <markrmiller@gmail.com> wrote:
>
>   
>> IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip from
>> now on.
>>
>> I hope to post new code with the warming either tonight or tomorrow night.
>> I would be ecstatic to have some help vetting that.
>>
>> Also, I am thinking of making a change so that when you release the Writer
>> the thread that releases does not block until reopen. I think the original
>> author did this so that if you add a doc with a thread and then immediately
>> search from the same thread, you are guaranteed to find the doc. However,
>> this gaurentee did not hold -- if another thread had a reference to the
>> Writer and a new thread grabbed a Writer and then quicly released before the
>> first thread, you will have added a doc but it will not be visible until the
>> first thread releases its reference to the Writer...since the concept is not
>> enforced anyway, you might as well not block for the final thread that
>> releases the Writer either. Instead I will grab a thread from a thread pool
>> to do the reopening with that thread, and return right after closing the
>> Writer. The result is that you cannot add a doc and search and expect to
>> find it without waiting a second or too. But this way things will be
>> consistent, and an app that adds docs will be a bit more responsive....eg it
>> wont hang as Readers are being reopened.
>>
>> I also have to bring the AccessProvider classes back. No easy way to use
>> your own custom Readers without it...I shouldn't have stripped it out.
>>
>> - Mark
>>
>>
>>
>> Cam Bazz wrote:
>>     
>>> Hello,
>>>
>>> Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems
>>> very interesting. I have read the discussion on the page, but I could
>>>       
>> not
>>     
>>> figure out which set of files is the latest.
>>> Is it the IndexAccessor-1.26.2008.zip file?
>>>
>>> I will read through the code, make my own tests, and send some feedback.
>>>
>>> Best.
>>> -C.B.
>>>
>>>
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message