Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 66302 invoked from network); 25 Sep 2007 19:56:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Sep 2007 19:56:55 -0000 Received: (qmail 10088 invoked by uid 500); 25 Sep 2007 19:56:39 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 10057 invoked by uid 500); 25 Sep 2007 19:56:38 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 10046 invoked by uid 99); 25 Sep 2007 19:56:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2007 12:56:38 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of markrmiller@gmail.com designates 64.233.166.181 as permitted sender) Received: from [64.233.166.181] (HELO py-out-1112.google.com) (64.233.166.181) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2007 19:56:36 +0000 Received: by py-out-1112.google.com with SMTP id d32so6937689pye for ; Tue, 25 Sep 2007 12:56:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; bh=tRtUG76VqG1+XCyjoBqrSE7qm3mDnwBR6jlsPwAmitA=; b=gKderovuhGMKabCZHcFrDdS6pJdVGsr/Bch4LTsBfOw5qxZTYUa1El/W194fAn/f79tZPCIpe2XwRsbT6jLsQ7AcbYSIIZf+kfF4Jtcx18iuWFwrG8TMquMtWwNlrK/+accjOs98F6p/6OVud6q6ayG5D/ohLAfwiiHf8QIRDG4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=LKYfp/cHfI4fRAyQiAjU4Gw3wWvJkrG1BW4qNH3wJaLaEsU6Pth3kvWX2b1GwXR9pZbm3YvAW3g8WBlwwDAotMUelFTMTTuv0AWExLbDNjC6+RIUUjDEDevKYSxDUa7AuO2sGBRnsZ1vMTfXNoIchbc3K/Q+B6xP1eAyskrIPwk= Received: by 10.64.156.2 with SMTP id d2mr10703631qbe.1190750174526; Tue, 25 Sep 2007 12:56:14 -0700 (PDT) Received: from ?192.168.1.108? ( [69.124.234.183]) by mx.google.com with ESMTPS id f16sm3112121qba.2007.09.25.12.56.11 (version=SSLv3 cipher=RC4-MD5); Tue, 25 Sep 2007 12:56:12 -0700 (PDT) Message-ID: <46F9676D.8020102@gmail.com> Date: Tue, 25 Sep 2007 15:54:21 -0400 From: Mark Miller User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: thread safe shared IndexSearcher References: <46F15AE1.5040400@ai.sri.com> <46F171FB.2080306@ai.sri.com> <46F1A8B5.1030401@ai.sri.com> <46F1D862.8000201@gmail.com> <46F2959D.9040403@ai.sri.com> <46F29849.4070701@gmail.com> <46F299A4.6030403@ai.sri.com> <46F7A320.9030100@gmail.com> <46F7E9F9.3010103@ai.sri.com> <46F8366F.1090806@gmail.com> <46F83E84.2000102@ai.sri.com> <46F8419E.7080501@gmail.com> <46F843C8.30604@ai.sri.com> <46F844EC.9060904@gmail.com> <46F949A1.90109@ai.sri.com> <46F9526F.1030702@gmail.com> <46F9598E.2090201@ai.sri.com> In-Reply-To: <46F9598E.2090201@ai.sri.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Agreed. Perhaps I will abandon the static init. I really only put it as an option due to your synchronized cost concerns (a preload allows non synched read only access to the indexaccessor cache). Due keep in mind that you don't have to use it though...if you dont preload, accessors are created on demand but require you to go through a synch block. I have some ideas and I will be making an attempt to smooth this all out tonight. Thanks for your input. - Mark Jay Yu wrote: > I agree with you on the compromise aspect of the design. > In particular, I think it's hard to preload all the index accessors in > the static init while allowing users specify the analyzer for each dir > without requiring complicated config file ans using reflection. > So a good compromise might be abandon preload the accessors. After > all, the accessors are cached and not created often. > > Thanks! > > Jay > > > Mark Miller wrote: >> I think its just a compromise in the design, though it could be >> improved. You only ever want a single Writer at a time on the index. >> Those two flags are really just hints for when a Writer is first >> opened...should it auto-commit and should it overwrite/create...if a >> thread tries to writer concurrently with another thread, they will >> briefly share a Writer, but generally a new Writer is created fairly >> often. >> >> The general strategy should be to pick constant values and always >> pass them. There is an opening for the issue that you have a Writer >> and are adding a doc, and then before releasing that Writer, another >> Writer from another thread tries to clear the index with a >> create=true, and it won't work. That's not a big concern though. >> >> So the problem really is that these params control what happens when >> a new writer is created, but your not guaranteed to be creating a >> Writer, it may be cached. You really should pass the same autocommit >> flag , though its not necessary. I am open to suggestions for a more >> coherent design, but functionally, it does work. I am also thinking >> about how to handle the Analyzer, and I think the solution (the need >> to init some indexaccessor params) might involve all these issues. >> >> - Mark >> >> Jay Yu wrote: >>> Mark, >>> >>> Looking at your implementation of the DefaultIndexAccessor regarding >>> the writer, I think there could be a problem: you have only one >>> cached writer but the getWriter(boolean, boolean) allows 2 booleans, >>> so ideally, you need 4 cached writer. Otherwise if one starts with a >>> writer that over writes the existing index, then later he cannot >>> append docs to the index. >>> Do I miss sth here or you have not finished the implementation of >>> getWriter yet? >>> >>> Thanks! >>> >>> Jay >>> >>> Mark Miller wrote: >>>> Ah, thanks for catching that. One of the pieces I did not >>>> finish...the keyword analyzer was placeholder code. >>>> >>>> I will take your comments into account and update the code. >>>> >>>> I have some other pieces to polish as well. Previously, I extended >>>> and built upon the original code, but I can't give it away, so this >>>> is my attempt at something lessor, but cleaner. >>>> >>>> Jay Yu wrote: >>>>> Thanks for the tip. >>>>> One small improvement on the IndexAccessorFactory might be to >>>>> allow user to specify the Analyzer instead of using a default >>>>> KeywordAnalyzer, which of course will make your static init of the >>>>> cached accessors difficult unless you add more interfaces to the >>>>> accessor to allow reset analyzer/Dir as in my own version. >>>>> >>>>> >>>>> >>>>> >>>>> Jay >>>>> >>>>> Mark Miller wrote: >>>>>> One final note....if you are using the IndexAccessor and you are >>>>>> only accessing the index from one JVM, you can use the >>>>>> NoLockFactory and save some sync cost there. >>>>>> >>>>>> Jay Yu wrote: >>>>>>> Mark, >>>>>>> >>>>>>> Great effort getting the original lucene index accessor package >>>>>>> in this shape. I am sure this will benefit a lot of people using >>>>>>> Lucene in a multithread env. >>>>>>> I have a quick question to ask you: >>>>>>> Do you have to use the core Lucene 2.3-dev in order to use the >>>>>>> accessor? >>>>>>> >>>>>>> I will take a look at your codes to see if I could help. I used >>>>>>> a slightly modified version of the original package in my >>>>>>> project but it breaks some of my tests. I hope your version >>>>>>> works better. >>>>>>> >>>>>>> Thanks a lot! >>>>>>> >>>>>>> Jay >>>>>>> >>>>>>> >>>>>>> Mark Miller wrote: >>>>>>>> I have sat down and rewrote IndexAccessor from scratch. I >>>>>>>> copied in the same reference counting logic, pruned some >>>>>>>> things, and tried to make the whole package a bit simpler to >>>>>>>> use. I have a few things to do, but its pretty solid already. >>>>>>>> The only major thing I'd still like to do is add an option to >>>>>>>> warm searchers before putting them in the Searcher cache. Id >>>>>>>> like to writer some more tests as well. Any help greatly >>>>>>>> appreciated if your interested in using the thing. >>>>>>>> >>>>>>>> >>>>>>>> http://myhardshadow.com/indexaccessor/trunk/src/test/com/mhs/indexaccessor/SimpleSearchServer.java >>>>>>>> >>>>>>>> >>>>>>>> Here is a an example of a class that can be instantiated in one >>>>>>>> of multiple threads and read /modify a single index without >>>>>>>> worrying about what any >>>>>>>> of the other threads are doing to the index at any given time. >>>>>>>> This is a very simple example of how to use the IndexAccessor >>>>>>>> and not necessarily an >>>>>>>> example of best practices. The main idea is that you get your >>>>>>>> Writer, Searcher, or Reader, and then be sure to release it as >>>>>>>> soon as your done with it >>>>>>>> in a finally block. For loading, you will want to load many >>>>>>>> docs with a Writer (batch them) before releasing it, but >>>>>>>> remember that Readers will not get a new view >>>>>>>> of the index until you release all of the Writers. So beware >>>>>>>> hogging a Writer unless you thats what your intending. >>>>>>>> >>>>>>>> JavaDoc: >>>>>>>> http://myhardshadow.com/indexaccessorapi/ >>>>>>>> >>>>>>>> Code: >>>>>>>> http://myhardshadow.com/indexaccessor/trunk/ >>>>>>>> >>>>>>>> Jar: >>>>>>>> http://myhardshadow.com/indexaccessorreleases/indexaccessor.jar >>>>>>>> >>>>>>>> >>>>>>>> Your synchronized block concerns: >>>>>>>> >>>>>>>> The synchronized blocks that control accesss to the >>>>>>>> IndexAccessor do not have a huge impact on performance. Keep in >>>>>>>> mind that all of the work is not done in a synchonrized block, >>>>>>>> just the retrieval of the Searcher, Writer, Reader. Even if the >>>>>>>> synchronization makes the method twice as expensive, it is >>>>>>>> still overpowered by the cost of parsing queries and searching >>>>>>>> the index. This applies with or without contention. I wrote a >>>>>>>> simple test and included the output below. You might use the >>>>>>>> IBM Lock Analyzer for Java to further analyze these costs. >>>>>>>> Trust me, this thing is speedy. Its many times better than >>>>>>>> using IndexModifier. >>>>>>>> >>>>>>>> Without Contention >>>>>>>> Just retrieve and release Searcher 100000 times >>>>>>>> ---- >>>>>>>> avg time:6.3E-4 ms >>>>>>>> total time:63 ms >>>>>>>> >>>>>>>> Parse query and search on 1 doc 100000 times >>>>>>>> ---- >>>>>>>> avg time:0.03107 ms >>>>>>>> total time:3107 ms >>>>>>>> >>>>>>>> >>>>>>>> With Contention (40 other threads running 80000 searches) >>>>>>>> Just retrieve and release Searcher 100000 times >>>>>>>> ---- >>>>>>>> avg time:0.04643 ms >>>>>>>> total time:4643 ms >>>>>>>> >>>>>>>> Parse query and search on 1 doc 100000 times >>>>>>>> ---- >>>>>>>> avg time:0.64337 ms >>>>>>>> total time:64337 ms >>>>>>>> >>>>>>>> >>>>>>>> - Mark >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> >>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>> >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org