lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: thread safe shared IndexSearcher
Date Tue, 25 Sep 2007 18:24:47 GMT
I think its just a compromise in the design, though it could be 
improved. You only ever want a single Writer at a time on the index. 
Those two flags are really just hints for when a Writer is first 
opened...should it auto-commit and should it overwrite/create...if a 
thread tries to writer concurrently with another thread, they will 
briefly share a Writer, but generally a new Writer is created fairly often.

The general strategy should be to pick constant values and always pass 
them. There is an opening for the issue that you have a Writer and are 
adding a doc, and then before releasing that Writer, another Writer from 
another thread tries to clear the index with a create=true, and it won't 
work. That's not a big concern though.

So the problem really is that these params control what happens when a 
new writer is created, but your not guaranteed to be creating a Writer, 
it may be cached. You really should pass the same autocommit flag , 
though its not necessary. I am open to suggestions for a more coherent 
design, but functionally, it does work. I am also thinking about how to 
handle the Analyzer, and I think the solution (the need to init some 
indexaccessor params) might involve all these issues.

- Mark

Jay Yu wrote:
> Mark,
>
> Looking at your implementation of the DefaultIndexAccessor regarding 
> the writer, I think there could be a problem: you have only one cached 
> writer but the getWriter(boolean, boolean) allows 2 booleans, so 
> ideally, you need 4 cached writer. Otherwise if one starts with a 
> writer that over writes the existing index, then later he cannot 
> append docs to the index.
> Do I miss sth here or you have not finished the implementation of 
> getWriter yet?
>
> Thanks!
>
> Jay
>
> Mark Miller wrote:
>> Ah, thanks for catching that. One of the pieces I did not 
>> finish...the keyword analyzer was placeholder code.
>>
>> I will take your comments into account and update the code.
>>
>> I have some other pieces to polish as well. Previously, I extended 
>> and built upon the original code, but I can't give it away, so this 
>> is my attempt at something lessor, but cleaner.
>>
>> Jay Yu wrote:
>>> Thanks for the tip.
>>> One small improvement on the IndexAccessorFactory might be to allow 
>>> user to specify the Analyzer instead of using a default 
>>> KeywordAnalyzer, which of course will make your static init of the 
>>> cached accessors difficult unless you add more interfaces to the 
>>> accessor to allow reset analyzer/Dir as in my own version.
>>>
>>>
>>>
>>>
>>> Jay
>>>
>>> Mark Miller wrote:
>>>> One final note....if you are using the IndexAccessor and you are 
>>>> only accessing the index from one JVM, you can use the 
>>>> NoLockFactory and save some sync cost there.
>>>>
>>>> Jay Yu wrote:
>>>>> Mark,
>>>>>
>>>>> Great effort getting the original lucene index accessor package in 
>>>>> this shape. I am sure this will benefit a lot of people using 
>>>>> Lucene in a multithread env.
>>>>> I have a quick question to ask you:
>>>>> Do you have to use the core Lucene 2.3-dev in order to use the 
>>>>> accessor?
>>>>>
>>>>> I will take a look at your codes to see if I could help. I used a 
>>>>> slightly modified version of the original package in my project 
>>>>> but it breaks some of my tests. I hope your version works better.
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> Jay
>>>>>
>>>>>
>>>>> Mark Miller wrote:
>>>>>> I have sat down and rewrote IndexAccessor from scratch. I copied

>>>>>> in the same reference counting logic, pruned some things, and 
>>>>>> tried to make the whole package a bit simpler to use. I have a 
>>>>>> few things to do, but its pretty solid already. The only major 
>>>>>> thing I'd still like to do is add an option to warm searchers 
>>>>>> before putting them in the Searcher cache. Id like to writer some

>>>>>> more tests as well. Any help greatly appreciated if your 
>>>>>> interested in using the thing.
>>>>>>
>>>>>>
>>>>>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java

>>>>>>
>>>>>>
>>>>>> Here is a an example of a class that can be instantiated in one 
>>>>>> of multiple threads and read /modify a single index without 
>>>>>> worrying about what any
>>>>>> of the other threads are doing to the index at any given time. 
>>>>>> This is a very simple example of how to use the IndexAccessor and

>>>>>> not necessarily an
>>>>>> example of best practices. The main idea is that you get your 
>>>>>> Writer, Searcher, or Reader, and then be sure to release it as 
>>>>>> soon as your done with it
>>>>>> in a finally block. For loading, you will want to load many docs

>>>>>> with a Writer (batch them) before releasing it, but remember that

>>>>>> Readers will not get a new view
>>>>>> of the index until you release all of the Writers. So beware 
>>>>>> hogging a Writer unless you thats what your intending.
>>>>>>
>>>>>> JavaDoc:
>>>>>> http://myhardshadow.com/indexaccessorapi/
>>>>>>
>>>>>> Code:
>>>>>> http://myhardshadow.com/indexaccessor/trunk/
>>>>>>
>>>>>> Jar:
>>>>>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar
>>>>>>
>>>>>>
>>>>>> Your synchronized block concerns:
>>>>>>
>>>>>> The synchronized blocks that control accesss to the IndexAccessor

>>>>>> do not have a huge impact on performance. Keep in mind that all 
>>>>>> of the work is not done in a synchonrized block, just the 
>>>>>> retrieval of the Searcher, Writer, Reader. Even if the 
>>>>>> synchronization makes the method twice as expensive, it is still

>>>>>> overpowered by the cost of parsing queries and searching the 
>>>>>> index. This applies with or without contention. I wrote a simple

>>>>>> test and included the output below. You might use the IBM Lock 
>>>>>> Analyzer for Java to further analyze these costs. Trust me, this

>>>>>> thing is speedy. Its many times better than using IndexModifier.
>>>>>>
>>>>>> Without Contention
>>>>>> Just retrieve and release Searcher 100000 times
>>>>>> ----
>>>>>> avg time:6.3E-4 ms
>>>>>> total time:63 ms
>>>>>>
>>>>>> Parse query and search on 1 doc 100000 times
>>>>>> ----
>>>>>> avg time:0.03107 ms
>>>>>> total time:3107 ms
>>>>>>
>>>>>>
>>>>>> With Contention (40 other threads running 80000 searches)
>>>>>> Just retrieve and release Searcher 100000 times
>>>>>> ----
>>>>>> avg time:0.04643 ms
>>>>>> total time:4643 ms
>>>>>>
>>>>>> Parse query and search on 1 doc 100000 times
>>>>>> ----
>>>>>> avg time:0.64337 ms
>>>>>> total time:64337 ms
>>>>>>
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message