lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Yu ...@AI.SRI.COM>
Subject Re: thread safe shared IndexSearcher
Date Thu, 20 Sep 2007 15:45:33 GMT
Mark,

Thanks for sharing your valuable exp. and thoughts.
Frankly our system already has most of the functionalities 
LuceneIndexAcessor offers. The only thing I am looking for is to sync 
the searchers' close. That's why I am little worried about the way 
accessor handles the searcher sync.
I will probably give it a try to see how it performs in our system.

Thanks!

Jay

Mark Miller wrote:
> The method is synched, but this is because each thread *does* share the 
> same Searcher. To maintain a cache of searchers across multiple threads, 
> you've got to sync -- to reference count, you've got to sync. The 
> performance hit of LuceneIndexAcessor is pretty minimal for its 
> functionality, and frankly, for the functionality you want, you have to 
> pay a cost. Thats not even the end of it really...your going to need to 
> maintain a cache of Accessor objects for each index as well...and if you 
> dont know all the indexes at startup time, access to this will also need 
> to be synched. I wouldn't worry though -- searches are still lightening 
> fast...that won't be the bottleneck. I'll work on getting you some code, 
> but if your worried, try some benchmarking on the original code.
> 
> Also, to be clear, I don't have the code in front of me, but getting a 
> Searcher does not require waiting for a Writer to be released. Searchers 
> are cached and resused (and instantly available) until a Writer is 
> released. When this happens, the release Writer method waits for all the 
> Searchers to return (happens pretty quick as searches are pretty quick), 
> the Searcher cache is cleared, and then subsequent calls to getSearcher 
> create new Searchers that can see what the Writer added.
> 
> The key is use your Writer/Searcher/Reader quickly and then release it 
> (unless your bulk loading). I've had such a system with 5+ million docs 
> on a standard machine and searches where still well below a second after 
> the first Searcher is cached (and even the first search is darn quick). 
> And that includes a lot of extra crap I am doing.
> 
> - Mark
> 
> Jay Yu wrote:
>> Mark,
>>
>> After reading the implementation of LuceneIndexAccessor.getSearcher(),
>> I realized that the method is synchronized and wait for 
>> writingDirector to be released. That means if we getSearcher for each 
>> query in each thread, there might be a contention and performance hit. 
>> In fact, even the method of release(searcher) is costly. On the other 
>> hand, if multiple threads share share one searcher then it'd defeat the
>> purpose of using LuceneIndexAccessor.
>> Do I miss sth here? What's your suggested use case for 
>> LuceneIndexAccessor?
>>
>> Thanks!
>>
>> Jay
>> Mark Miller wrote:
>>> Ill respond a point at a time:
>>>
>>> 1.
>>>
>>> ****************************** Hi Maik,
>>>
>>> So what happens in this case:
>>>
>>> IndexAccessProvider accessProvider = new IndexAccessProvider(directory,
>>>
>>> analyzer);
>>>
>>> LuceneIndexAccessor accessor = new LuceneIndexAccessor(accessProvider);
>>>
>>> accessor.open();
>>>
>>> IndexWriter writer = accessor.getWriter();
>>>
>>> // reference to the same instance?
>>>
>>> IndexWriter writer2 = accessor.getWriter();
>>>
>>> writer.addDocument(....);
>>>
>>> writer2.addDocument(....);
>>>
>>>
>>>
>>> // I didn't release the writer yet
>>>
>>> // will this block?
>>>
>>> IndexReader reader = accessor.getReader();
>>>
>>> reader.delete(....);
>>>
>>> ************
>>>
>>> This is not really an issue. First, if you are going to delete with a 
>>> Reader
>>> you need to call getWritingReader and not getReader. When you do 
>>> that, the
>>> getWritingReader call will block until writer and writer2 are 
>>> released. If
>>> you are just adding a couple docs before releasing the writers, this 
>>> is no
>>> problem because the block will be very short. If you are loading tons of
>>> docs and you want to be able to delete with a Reader in a timely 
>>> manner, you
>>> should release the writers every now and then (release and re-get the 
>>> Writer
>>> every 100 docs or something). An interactive index should not hog the
>>> Writer, while something that is just loading a lot could hog the Writer.
>>> This is no different than normal…you cannot delete with a Reader while
>>> adding with a Writer with Lucene. This code just enforces those 
>>> semantics.
>>> The best solution is to just use a Writer to delete – I never get a
>>> ReadingWriter.
>>>
>>> 2. http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>
>>> This is no big deal either. I just added another getWriter call that 
>>> takes a
>>> create Boolean.
>>>
>>> 3. I don't think there is a latest release. This has never gotten much
>>> official attention and is not in the sandbox. I worked straight from the
>>> originally submitted code.
>>>
>>> 4. I will look into getting together some code that I can share. The
>>> multisearcher changes that are need are a couple of one liners 
>>> really, so at
>>> a minimum I will give you the changes needed.
>>>
>>>
>>>
>>> -       Mark
>>>
>>>
>>>
>>> On 9/19/07, Jay Yu <yu@ai.sri.com> wrote:
>>>
>>> Mark,
>>>
>>>
>>>
>>> thanks for sharing your insight and experience about 
>>> LuceneIndexAccessor!
>>>
>>> I remember seeing some people reporting some issues about it, such as:
>>>
>>> http://www.archivum.info/java-dev@lucene.apache.org/2005-05/msg00114.html 
>>>
>>>
>>> http://issues.apache.org/bugzilla/show_bug.cgi?id=34995#c3
>>>
>>>
>>>
>>> Have those issues been resolved?
>>>
>>>
>>>
>>> Where did you get the latest release? It is not in the official Lucene
>>>
>>> sandbox/contrib.
>>>
>>>
>>>
>>> Finally, are you willing to share your extended version to include your
>>>
>>> tweak relating to the MultiSearcher?
>>>
>>>
>>>
>>> Thanks a lot!
>>>
>>>
>>>
>>> Jay
>>>
>>>
>>>
>>> Mark Miller wrote:
>>>
>>>> I use option 3 extensivley and find it very effective. There is a 
>>>> tweak or
>>>
>>>> two required to get it to work right with MultiSearchers, but other 
>>>> than
>>>
>>>> that, the code is great. I have built a lot on top of it. I'm on the 
>>>> list
>>>
>>>> all the time and would be happy to answer any questions you have in
>>> regards
>>>
>>>> to LuceneIndexAccessor. Frankly, I think its overlooked far too much.
>>>
>>>
>>>> - Mark
>>>
>>>
>>>
>>>> On 9/19/07, Jay Yu <yu@ai.sri.com> wrote:
>>>
>>>
>>>>> In a multithread app like web app, a shared IndexSearcher could 
>>>>> throw a
>>>
>>>>> AlreadyClosedException when another thread is trying to update the
>>>
>>>>> underlying IndexReader by closing the shared searcher after the 
>>>>> index is
>>>
>>>>> updated. Searching over the past discussions on this mailing list, I
>>>
>>>>> found several approaches to solve the problem.
>>>
>>>>> 1. use solr
>>>
>>>>> 2. use DelayCloseIndexSearcher
>>>
>>>>> 3. use LuceneIndexAccessor
>>>
>>>
>>>
>>>>> the first one is not feasible for us; some people seemed to have
>>>
>>>>> problems with No. 2 and I do not find a lot of discussions around 
>>>>> No.3.
>>>
>>>
>>>>> I wonder if anyone has good experience on No 2 and 3?
>>>
>>>>> Or do I miss other better solutions?
>>>
>>>
>>>>> Thanks for any suggestion/comment!
>>>
>>>
>>>>> Jay
>>>
>>>
>>>>> ---------------------------------------------------------------------
>>>
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>>
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message