Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 5062 invoked from network); 18 Oct 2006 20:18:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 18 Oct 2006 20:18:49 -0000 Received: (qmail 32188 invoked by uid 500); 18 Oct 2006 20:18:46 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 32155 invoked by uid 500); 18 Oct 2006 20:18:46 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 32144 invoked by uid 99); 18 Oct 2006 20:18:46 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Oct 2006 13:18:46 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [208.97.132.202] (HELO spunkymail-a18.dreamhost.com) (208.97.132.202) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Oct 2006 13:18:44 -0700 Received: from [192.168.0.2] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a18.dreamhost.com (Postfix) with ESMTP id CC94B5B52E for ; Wed, 18 Oct 2006 13:18:20 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: <8098507.1161200497478.JavaMail.jira@brutus> References: <8098507.1161200497478.JavaMail.jira@brutus> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <3310D485-78B2-4958-BDB3-439E836464A3@apache.org> Content-Transfer-Encoding: 7bit From: Grant Ingersoll Subject: Re: [jira] Commented: (LUCENE-687) Performance improvement: Lazy skipping on proximity file Date: Wed, 18 Oct 2006 16:18:20 -0400 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Can you share your performance test as well as the results? http://issues.apache.org/jira/browse/LUCENE-675 Thanks, Grant On Oct 18, 2006, at 3:41 PM, Michael Busch (JIRA) wrote: > [ http://issues.apache.org/jira/browse/LUCENE-687? > page=comments#action_12443343 ] > > Michael Busch commented on LUCENE-687: > -------------------------------------- > > Hi Yonik, > > thanks for the quick reply! I'm going to do performance tests and > will give you some numbers soon. > >> Performance improvement: Lazy skipping on proximity file >> -------------------------------------------------------- >> >> Key: LUCENE-687 >> URL: http://issues.apache.org/jira/browse/LUCENE-687 >> Project: Lucene - Java >> Issue Type: Improvement >> Components: Index >> Reporter: Michael Busch >> Priority: Minor >> Attachments: lazy_prox_skipping.patch >> >> >> Hello, >> I'm proposing a patch here that changes >> org.apache.lucene.index.SegmentTermPositions to avoid unnecessary >> skips and reads on the proximity stream. Currently a call of next >> () or seek(), which causes a movement to a document in the freq >> file also moves the prox pointer to the posting list of that >> document. But this is only necessary if actual positions have to >> be retrieved for that particular document. >> Consider for example a phrase query with two terms: the freq >> pointer for term 1 has to move to document x to answer the >> question if the term occurs in that document. But *only* if term 2 >> also matches document x, the positions have to be read to figure >> out if term 1 and term 2 appear next to each other in document x >> and thus satisfy the query. >> A move to the posting list of a document can be quite expensive. >> It has to be skipped to the last skip point before that document >> and then the documents between the skip point and the desired >> document have to be scanned, which means that the VInts of all >> positions of those documents have to be read and decoded. >> An improvement is to move the prox pointer lazily to a document >> only if nextPosition() is called. This will become even more >> important in the future when the size of the proximity file >> increases (e. g. by adding payloads to the posting lists). >> My patch implements this lazy skipping. All unit tests pass. >> I also attach a new unit test that works as follows: >> Using a RamDirectory an index is created and test docs are added. >> Then the index is optimized to make sure it only has a single >> segment. This is important, because IndexReader.open() returns an >> instance of SegmentReader if there is only one segment in the >> index. The proxStream instance of SegmentReader is package >> protected, so it is possible to set proxStream to a different >> object. I am using a class called SeeksCountingStream that extends >> IndexInput in a way that it is able to count the number of >> invocations of seek(). >> Then the testcase searches the index using a PhraseQuery "term1 >> term2". It is known how many documents match that query and the >> testcase can verify that seek() on the proxStream is not called >> more often than number of search hits. >> Example: >> Number of docs in the index: 500 >> Number of docs that match the query "term1 term2": 5 >> Invocations of seek on prox stream (old code): 29 >> Invocations of seek on prox stream (patched version): 5 >> - Michael > > -- > This message is automatically generated by JIRA. > - > If you think it was sent incorrectly contact one of the > administrators: http://issues.apache.org/jira/secure/ > Administrators.jspa > - > For more information on JIRA, see: http://www.atlassian.com/ > software/jira > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > -------------------------- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org